Creating and Managing Dimensions with Parent-Child Hierarchies

A parent-child hierarchy is a hierarchy of members that all have the same type.

This contrasts with level-based hierarchies, where members of the same type occur only at a single level of the hierarchy.

This section contains the following topics:

About Parent-Child Hierarchies

A common real-life parent-child hierarchy occurrence is an organizational reporting hierarchy chart.

In an organizational reporting hierarchy chart, the following can apply:

Each individual in the organization is an employee.
Each employee, apart from the top-level managers, reports to a single manager.
The reporting hierarchy has many levels.

These conditions illustrate the basic features that define a parent-child hierarchy, namely:

A parent-child hierarchy is based on a single logical table, for example, the Employees table
Each row in the table contains two identifying keys, one to identify the member itself, the other to identify the parent of the member, for example, Emp_ID and Mgr_ID.

The image shows an example of a multi-level parent-child hierarchy.

Description of GUID-904079E9-38E3-4DC0-8DFA-163D271BAE64-default.gif follows

Description of the illustration GUID-904079E9-38E3-4DC0-8DFA-163D271BAE64-default.gif

The following table shows how this parent-child hierarchy could be represented by the rows and key values in an Employees table.

Emp_ID	Mgr_ID
Andrew	null
Barbara	Andrew
Carlos	Andrew
Dawn	Barbara
Emre	Barbara

You can expose logical dimensions with parent-child hierarchies to Oracle BI Answers users by creating presentation hierarchies that are based on particular logical dimensions. Creating hierarchies in the Presentation layer enables users to create hierarchy-based queries. See Working with Presentation Hierarchies and Levels.

This section contains the following topics:

About Levels and Distances in Parent-Child Hierarchies

All the dimension members of a parent-child hierarchy occur in a single logical column, unlike with level-based hierarchies.

In a parent-child hierarchy, the parent of a member is in another row in the same logical column, pointed to by the parent key. This is unlike a level-based hierarchy, where the parent of a member is in a different logical column in the same row. In other words, navigation in a parent-child hierarchy follows data values, while navigation in a level-based hierarchy follows the metadata structure.

In level-based hierarchies, each level is named, and occupies a position in the hierarchy that corresponds to a real-world attribute or category that is deemed useful for analysis. Unlike level-based hierarchies, where the number of levels is fixed at design time, there is no limit to the depth of a parent-child hierarchy, and the depth can change at run time due to new data.

Every Oracle BI Server parent-child hierarchy has two system-generated entities, Total and Detail, that are automatically defined for each parent-child hierarchy when the logical dimension is created. The Detail entity contains all the hierarchy members.

These two system-generated entities are different from the implicit, inter-member levels between ancestors and descendants in a parent-child hierarchy. These implicit levels are referred to as parent-child hierarchical levels in this section.

Closely associated with levels is the concept of distance in parent-child hierarchies. The distance of one member from another is the number of parent-child hierarchical levels from the member to an ancestor or to a descendant. For example, the distance from a member to its parent is always 1, and the distance from Andrew to Emre is 2. See About Parent-Child Hierarchies for an image of this example.

About Parent-Child Relationship Tables

In relational tables, the relationships between different members in a parent-child hierarchy are implicitly defined by the identifier key values in the associated base table.

For each Oracle BI Server parent-child hierarchy defined on a relational table, you must also explicitly define the inter-member relationships in a separate parent-child relationship table.

The parent-child relationship table must include four columns, as follows:

A column that identifies the member
A column that identifies an ancestor of the member

Note:

An ancestor is the parent of the member, or a higher-level ancestor.
A distance column that specifies the number of parent-child hierarchical levels from the member to the ancestor
A leaf column that indicates if the member is a leaf member (1=Yes, 0=No)

The column names can be user defined. The data types of the columns must satisfy the following conditions:

The member and ancestor identifier columns have the same data type as the associated columns in the logical table that contains the hierarchy members.

Note:

The example shown in the table uses text strings for readability, but you normally use integer surrogate keys for member_key and ancestor_key, if they exist in the source dimension table.
The distance and leaf columns are INTEGER columns.

Note the following about the rows in a parent-child relationship table:

Each member must have a row pointing at itself, with distance zero.
Each member must have a row pointing at each of its ancestors. For a root member, this is a termination row with null for the parent and distance values.

The table shows an example of a parent-child relationship table with rows that represent the inter-member relationships for the employees. See the figure in About Parent-Child Hierarchies.

Member_Key	Ancestor_Key	Distance	Isleaf
Andrew	Andrew	0	0
Barbara	Barbara	0	0
Carlos	Carlos	0	0
Dawn	Dawn	0	0
Emre	Emre	0	0
Andrew	null	null	0
Barbara	Andrew	1	0
Carlos	Andrew	1	1
Dawn	Barbara	1	1
Dawn	Andrew	2	1
Emre	Barbara	1	1
Emre	Andrew	2	1

You generate scripts to create and populate the parent-child relationship table through a wizard that you can invoke when you define the parent-child hierarchy. Note the following about the create and load scripts:

You run the create script only once, to create the parent-child relationship table in the data source.
You must run the load script after each time the data changes in the dimension table. Because of this, you typically call the load script in your ETL processing. The load script reloads the entire parent-child relationship table; it is not incremental.

If you do not choose to use the wizard, then you must create the parent-child relationship table manually and then import it into the Physical layer before associating it with the parent-child hierarchy. In this latter case, it is also your responsibility to populate the table with the data required to describe the inter-member relationships in the parent-child hierarchy.

Creating Dimensions with Parent-Child Hierarchies

The key elements that you must define for a parent-child hierarchy are the identifier columns for the member and the parent of the member.

This basic principle applies to all parent-child hierarchies, regardless of the data source from which the hierarchy is derived.

Parent-child hierarchies that are based on relational tables must have an accompanying parent-child relationship table. See About Parent-Child Relationship Tables andDefining Parent-Child Relationship Tables.

In the Business Model and Mapping layer of the Administration Tool, perform one of the following steps:
- Right-click a business model and select New Object > Logical Dimension > Dimension with Parent-Child Hierarchy. Note that this option is only available if there is at least one logical dimension table in the business model that has no dimension associated with it.
- Right-click a dimension table that is not associated with any dimension and select Create Logical Dimension, then select Dimension with Parent-Child Hierarchy.
In the Logical Dimension dialog, in the General tab, type a name for the dimension.
Click Browse beside the Member Key field.

The Browse window shows the logical dimension tables in the business model, each with their primary keys.
Select a Member Key for the parent-child hierarchy and click OK.
Click Browse beside the Parent Key field.

The Browse window shows the columns, other than the primary key, in the logical table that you selected in step 4.
Select a column that will be the Parent Key for the parent-child hierarchy and click OK.
If the logical table that you selected in step 4 is not from a relational table source, click OK to finish the process of creating the dimension.

If the logical table you selected in step 4 is from a relational table source, you must continue the dimension definition process by setting up the parent-child relationship table for the hierarchy.

Defining Parent-Child Relationship Tables

Use these steps to define a parent-child relationship table for parent-child hierarchies based on relational tables.

When you create the parent-child relationship table, you must choose one of the following options:

(Recommended method) Use a wizard that generates scripts to create and populate the parent-child relationship table.

When you select Create Parent-Child Relationship Table, the Generate Parent-Child Relationship Table Wizard generates SQL scripts for creating and populating the parent-child relationship table. At the end of the wizard, the Oracle BI Server stores the scripts into directories chosen during the wizard session. The scripts, when executed, make the parent-child relationship table available to the metadata repository.

In the Generate Parent-Child Relationship Table wizard, you must provide a enter name for the DDL Script to Generate the Parent-Child Table, and select the location for storing the generate script. You must also supply a name for the parent-child relationship table and select the catalog or schema for the parent-child relationship table. You can preview the generated scripts.
Select a previously-created parent-child relationship table.

The parent-child relationship table must have at least four columns that describe how the inter-member relationships are derived in the logical table selected for the hierarchy. See About Parent-Child Relationship Tables.

In the Logical Dimension dialog, click Parent-Child Settings.
Do one of the following to define the parent-child relationship table for the hierarchy:
- (Recommended method) Click Create Parent-Child Relationship Table follow the wizard prompts.
- Click Select Parent-Child Relationship Table to start the manual method of defining the parent-child relationship table for the parent-child hierarchy.
When using the manual method, select the physical table that acts as the parent-child relationship table for your parent-child hierarchy.

The table must already exist in the Physical layer.
Map the four columns from the physical parent-child relationship table to the fields in the Parent-Child Table Column Details area, as follows:
1. Select the Member Key column.
2. Select the Parent Key column.
3. Select the Relationship Distance column.
4. Select the Leaf Node Identifier column.
Click OK, then click OK again to finish the manual process of defining the parent-child relationship table.
If you used the Generate Parent-Child Relationship Table Wizard to generate create and load scripts, run the scripts to create and load the parent-child relationship table in your data source.

Modeling Aggregates for Parent-Child Hierarchies

Fact tables in level-based hierarchies might only contain facts for a single level of the hierarchy.

Facts for higher-level dimension members can be calculated by aggregating the facts from the lower-level fact table or from a higher-level summary table.

In contrast, parent-child hierarchies require data modelers to make some additional decisions related to the following:

How to store the base facts in the fact table
How to aggregate the base facts to obtain the facts for higher-level members of the parent-child hierarchy

This section describes how to store and aggregate facts for parent-child hierarchies and contains the following topics:

Storing Facts for Parent-Child Hierarchies

There are two options for storing the base facts in the fact table for parent-child hierarchies.

You can use the following options:

Store facts for only the leaf members of the parent-child hierarchy.
Store facts for members at any level of the parent-child hierarchy, including non-leaf members.

The first option is more appropriate if the facts for the non-leaf members of the parent-child hierarchy can be derived entirely from the facts of the leaf members. For example, if you have a parent-child product hierarchy in which the actual product members appear only as leaf members of the hierarchy, then it makes sense for a revenue fact table to only record revenue facts for the leaf members of this product hierarchy. The revenue figures for the non-leaf members of the product hierarchy such as the product categories can be derived entirely by aggregating the facts for the leaf product members at the bottom of the hierarchy.

The image shows example data for a situation where facts are stored only for leaf members in a parent-child hierarchy.

The following table shows example data for the dimension table PRODUCT_DIM:

MemberKey	Name	ParentKey
P1	Product1	C1
P2	Product2	C1
C1	Category1	C2
C2	Category2	C3
C3	Category3	-

The following table shows example data for the fact table REVENUE_FACTS:

ProductKey	YearKey	Revenue
P1	2011	100,000
P1	2012	105,000
P2	2011	75,000
P2	2012	80,000

The second option in which facts are stored for members at any level of the parent-child hierarchy is necessary when the facts for the non-leaf members are not completely derived from facts of the leaf members. A good example is a sales person hierarchy in which a sales person might report to a manager who is also a sales person. Each individual sales person, including the manager, could have a different revenue figure stored in the fact table.

The table shows example data for this situation.

Facts Stored for Both Leaf and Non-Leaf Members

The following table shows example data for the dimension table SALES_REP_DIM:

MemberKey	Name	ParentKey
101	Phillip	201
102	Vivian	201
201	Jacob	301
202	Audrey	301
301	Ryan	-

The following table shows example data for the fact table REVENUE_FACTS:

SalesRepKey	YearKey	Revenue
101	2012	1,200,000
102	2012	1,100,000
201	2012	250,000
202	2012	1,400,000

Another case in which storing facts for both leaf and non-leaf members is appropriate is when the rules for aggregating the parent-child hierarchy are complex, or when aggregating the hierarchy at query time is expensive and would lead to unacceptably long query response times. In this case, the fact table would store preaggregated facts for the non-leaf members in addition to the facts stored for the leaf members.

Aggregating Parent-Child Hierarchies

As a data modeler, you must determine how to aggregate the stored facts to calculate the aggregated facts for higher level members of the parent-child hierarchy.

In addition to choosing the correct aggregation function for the measure, you must decide if you need to roll up the fact values recorded for lower-level members to calculate the values for higher-level members. In some cases, rolling up the facts of lower-level members of the parent-child hierarchy makes sense. In other cases such as with a pre-aggregated fact table or a measure that is intended to show each member's individual contribution, rolling up the facts from lower-level members of the parent-child hierarchy is incorrect.

Rolling up Facts from Lower-Level Members of a Parent-Child Hierarchy

If a fact table only stores facts for the leaf members of a parent-child hierarchy or if the fact table only records each member's individual contribution, then most likely the values stored in the fact table must be rolled up to obtain the correct aggregated value for higher-level members of the parent-child hierarchy. Rolling up the facts along a parent-child hierarchy is achieved by joining the fact table to the dimension table through the parent-child relationship table, see Adding the Parent-Child Relationship Table to the Model.

For a fact table that stores facts only for the leaf members such as the product revenue fact table, this modeling technique calculates aggregate values that correctly summarize all the facts for the leaf-level members.

For a fact table that stores the individual contribution of both leaf members and non-leaf members, this technique computes a hierarchical aggregate that summarizes the individual contributions of the member and all its members.

Modeling Individual Contribution Measures

To report the individual contribution of each member, in addition, to reporting the summarized hierarchical aggregate that rolls up the individual contributions of multiple members, you must create two separate fact logical table sources. One fact logical table source maps the base fact table and the parent child relationship table. This is the logical table source for the hierarchical aggregate measure. The second fact logical table source maps only an alias of the fact table. This fact table alias should join directly with the dimension table rather that joining indirectly through the parent-child relationship table. This is the logical table source for the individual contribution measure.

Modeling Pre-aggregated Measures

Some fact tables contain pre-aggregated data that is populated for all members of the parent-child hierarchy. For example, the fact value for a root member might be populated with the aggregation of the data for all of its descendent members. It is important to ensure that queries do not aggregate the members from this dimension to avoid erroneous results.

To correctly model this type of parent-child hierarchy, you must create a parent-child relationship table to support hierarchical filter functions like IsAncestor and IsDescendant. You can join the parent-child dimension table directly with the fact table rather than joining through the parent-child relationship table to ensure that the pre-aggregated member value is returned, rather than rolling up all the descendants.

Note:

Do not modify the parent-child relationship table script to only include the self rows, because doing so would break the IsAncestor and IsDescendant functions.

To achieve the correct aggregation for dimensions of this type, you must determine what you want to see as a grand total when the parent-child hierarchy is aggregated. For example, assume that your hierarchy contains a single root member, and you want to display the pre-aggregated value for this root member. You must first create an additional fact logical table source mapped at the Total level of the parent-child hierarchy. Next, in the logical table source, create a WHERE clause filter that selects only the root member.

With this model in place, for queries that are at the Total level of the parent-child hierarchy, the Oracle BI Server selects the aggregate logical table source and applies the root member WHERE clause filter. For queries that are at the Detail level, the Oracle BI Server selects the detailed logical table source and returns the pre-aggregated member values. In either case, it does not matter how the aggregation rule is set, because there is a pre-aggregated source at each level.

Use this approach only if the queries are at the Total or Detail level of the parent-child dimension. For queries that group by some non-unique attribute of the parent-child dimension, the aggregation might not be correct. For example, if an Employee dimension has a Location attribute, and a query groups by Employee.Location, then double counting is likely because an employee often reports to other employees at the same location. Because of this, when fact tables contain pre-aggregated member values, you should avoid grouping by non-unique attributes of the parent-child dimension. If grouping by those attributes is unavoidable, then you should model them as separate dimensions.

Adding the Parent-Child Relationship Table to the Model

For measures in fact tables that are aggregated by rolling up the facts from lower-level members, you must edit Physical layer joins to include the parent-child relationship table.

You need to add the parent-child relationship table to the appropriate logical table source.

For fact tables containing pre-aggregated data for a parent-child hierarchy or for individual contribution measures, you should join the parent-child dimension table directly with the fact table rather than joining through the parent-child relationship table.

Joining the parent-child dimension table directly with the fact table ensures that the pre-aggregated value or individual contribution value is returned, rather than rolling up all the descendants. When pre-aggregated data is populated for all members, do not add the parent-child relationship table to the logical table source to avoid over counting.

In the Administration Tool, navigate to the Physical layer of the repository, right-click the appropriate physical tables and select Physical Diagram > Selected Object(s) Only.
Delete the direct joins from the dimension table to each of the fact tables.
Create a join from the parent-child relationship table to the dimension table using the ancestor key.
Create joins from the fact tables to the parent-child relationship table using the member key.
In the Business Model and Mapping layer, double-click the logical table source for the logical fact table that is used in your parent-child hierarchy.
In the General tab of the Logical Table Source dialog, click the Add button.
Browse to locate the parent-child relationship table in the Physical layer and click Select.
Click OK in the Logical Table Source dialog.

Maintaining Parent-Child Hierarchies Based on Relational Tables

For parent-child hierarchies based on relational tables, you must ensure that the data in the parent-child relationship table accurately reflects the inter-member relationships in the dimension.

You may have created scripts manually to create and populate the parent-child relationship table, or you may have used the Generate Parent-Child Relationship Table Wizard to create the scripts. You must run these scripts, adapting them if necessary, as often as required to guarantee the integrity of the parent-child relationships in the hierarchy. You typically want to add the Populate script to your ETL process so that it runs after the dimension table is updated.