Bookshelf v10.1.3.2: Identifying the Database Content For The Business Model

Oracle^®Business Intelligence Server Administration Guide > Planning and Creating an Oracle BI Repository > Process of Oracle BI Repository Planning and Design >

Identifying the Database Content For The Business Model

This topic is part of the Process of Oracle BI Repository Planning and Design.

The Oracle BI Server provides an interface that allows you to map the Oracle BI repository to your underlying physical databases. Sometimes you can control the physical design of the underlying databases. However, sometimes the database already exists and you need to work with what is there. In either case, you need to understand the structure and content of your physical databases.

This section discusses the following topics:

About Types of Physical Schemas

There are two types of physical schemas (models): entity-relationship (E-R) schemas and dimensional schemas. E-R schemas are designed to minimize data storage redundancy and optimize data updates. Dimensional schemas are designed to enhance understandability and to optimize query performance.

Entity-Relationship (E-R) Schemas. The entity-relationship (E-R) schema is the classic, fully normalized relational schema used in many online transaction processing (OLTP) systems. The relationships between tables signify relationships between data, not necessarily business relationships.
Typically, E-R schemas have many tables, sometimes hundreds or even thousands. There are many tables because the data has been carefully taken apart (normalized, in database terminology) with the primary goal of reducing data redundancy and bringing about fast update performance. E-R schemas are very efficient for OLTP databases. When E-R databases are queried, joins are usually predetermined and can be optimized. E-R databases are usually queried by applications that know exactly where to go and what to ask. These applications typically query small units of information at a time, such as a customer record, an order, or a shipment record.

E-R schemas generally do not work well for queries that perform historical analysis due to two major problems: poor performance and difficulty in posing the question in SQL.
- Performance problems persist with historical E-R queries because the queries require the database to put the data back together again; this is a slow, complex process. Furthermore, because the cost-based optimizers in database management systems are not designed for the level of complexity in such a query, they can generate query plans that result in poor performance.
- A Database Analyst (DBA) who is very familiar with the data might be able to write a SQL query against an E-R schema that can theoretically answer a business question, but such detailed knowledge of the data is generally beyond the scope of the end-user business analyst. Even when the SQL is crafted properly, there is often an unacceptably high response time in returning results to the user.
Dimensional Schemas. A dimensional schema is a denormalized schema that follows the business model. Dimensional schemas contain dimension tables and fact tables. Dimension tables contain attributes of the business, and fact tables contain individual records with a few facts and foreign keys to each of the dimension tables.
Dimensional schemas are used for business analysis and have two major advantages over E-R schemas for decision support:
- Better query performance
- Easier to understand
  Dimensional schemas are not nearly as efficient as E-R schemas for updating discrete records, but they work well for queries that analyze the business across multiple dimensions.
  
  The following are two types of dimensional schemas:
- Star schema. A star schema is a dimensional schema with a single fact table that has foreign key relationships with several dimension tables.
  - The dimension tables mirror the business model and contain columns with descriptive attributes, such as Product, Size, and Color in the sample Products dimension. Dimension tables also have a key column (or columns) that uniquely identifies each row in the table.
  - The fact table has a multipart primary key, often made up of the foreign key references to the dimension tables. The fact table also contains all the measures, or facts, used to measure business performance. The lowest level of detail stored is the granularity of the fact table. Information at higher levels of aggregation is either calculated from the detail level records or precomputed and stored in separate aggregate fact tables, resulting in a multiple-star schema. For a discussion of aggregate fact tables, read Identifying Fact Tables.
- Snowflake schema. A snowflake schema is a dimensional schema where one or more of the dimensions are normalized to some extent.
  NOTE: It is recommended that you minimize the use of snowflake schemas.
  
  The difference between the type of normalization in a snowflake schema and in an E-R schema is that the snowflake normalization is based on business hierarchy attributes. The tables snowflaked off the dimensions have parent-child relationships with each other that mimic the dimensional hierarchies. In a snowflake schema, multiple logical tables are considered a single logical table.
  
  For example, Figure 6 contains a snowflake schema showing that Product Line > Products is a branch of the snowflake.
  
  Figure 6. Diagram of a Snowflake Schema
  
  One dimension hierarchy should be created for the Products branch. The following is a list of the minimal levels that should be created for this hierarchy:
  - Grand Total Level
  - Detail Level of the dimension that is not joined to the fact table. In this case, it is ProductLine.
  - Detail Level of the dimension that is joined to the fact table.
    Figure 7 shows the hierarchy that should be created.
    
    Figure 7. Hierarchy for the Products Branch in the Business Model and Mapping Layer

About Primary Key-Foreign Key Relationships

To fully understand the structure and content of your physical databases, it is important to become familiar with the concepts behind primary key-foreign key relationships.

A primary key-foreign key relationship defines a one-to-many relationship between two tables in a relational database. A foreign key is a column or a set of columns in one table that references the primary key columns in another table. The primary key is defined as a column (or set of columns) where each value is unique and identifies a single row of the table.

Consider Figure 8, where the upper table is a fact table named Sales and the lower table is a dimension table named Date. The Sales fact table contains one row for every sales transaction, and the Date dimension table contains one row for every date the database will potentially cover.

Figure 8. Primary Key-Foreign Key Sample

Because of this primary key-foreign key relationship, you can join the Sales and Date tables to combine the other attributes of the Date table with the records in the Sales table. For example, if an analyst asks for the total sales, the day of the week, the product name, and the store in which the product was sold, the information is compiled by joining the sales, date, product, and store tables through the primary key-foreign key relationships between the Sales table and the various dimension tables.

Identifying the Database Table Structure To Import

The Administration Tool provides an interface to map logical tables to the underlying physical tables in the database. Before you can map the tables, you need to identify the contents of the physical database as it relates to your business model. To do this correctly, you need to identify the following contents of the physical database:

Identify the contents of each table
Identify the detail level for each table
Identify the table definition for each aggregate table. This allows you to set up aggregate navigation. The following detail is required by the Oracle BI Server:
- The columns by which the table is grouped (the aggregation level)
- The type of aggregation (SUM, AVG, MIN, MAX, or COUNT)
  For information on how to set up aggregate navigation in a repository, refer to Setting Up Fragmentation Content in an Oracle BI Repository for Aggregate Navigation.
Identify the contents of each column
Identify how each measure is calculated
Identify the joins defined in the database

To acquire this information about the data, you could refer to any available documentation that describes the data elements created when the database was implemented, or you might need to spend some time with the DBA for each database to get this information. To fully understand all the data elements, you might also need to ask people in the organization who are users of the data, owners of the data, or the application developers of applications that create the data.

Oracle® Business Intelligence Server Administration Guide		Copyright © 2007, Oracle. All rights reserved.