Designing Your Database for OLAP, 5 of 6

Data Structures in Relational and Multidimensional Data Stores

Oracle offers both relational and multidimensional storage within a single database. Historical and derived data can be stored either in relational tables or in multidimensional objects.

Relational Table Storage

The lowest level of historical data, as well as fully aggregated historical data, can be stored in fact tables in your data warehouse. The lowest level in a data warehouse is typically at a much higher level than in the transactional database. The transactional data should be aggregated to a base level where patterns and trends can emerge and analysis is meaningful, before being stored in the data warehouse. For example, individual purchase orders might be aggregated by sales representative, zip code, or some other demographic feature.

Dimension tables, also called lookup tables, are used to store the dimension members that determine the aggregation criteria for fact data. Dimension members are typically organized in levels that roll up within hierarchies.

The Oracle RDBMS provides materialized views for storing precomputed data derived from fact tables. Materialized views significantly improve querying times because the aggregates are computed and stored as a database administration task for everyone's use, that is, when the data is refreshed rather than each time the aggregates are needed.

Multidimensional Table Storage

As an alternative to relational table storage, data can be stored in multidimensional objects in analytic workspaces. Analytic workspaces are multidimensional structures that are designed specifically to support analytic processing. The equivalent of a relational table in an analytic workspace is a variable. You can think of variables as multidimensional tables. The historical and derived data in a data warehouse can be distributed between relational tables and workspace variables. Keep in mind that there is no need to duplicate data; it can be stored in tables or variables, but it does not need to be stored in both.

You can use the sophisticated analysis tools of the OLAP DML to generate new data such as forecasts. You have the option of copying this data into relational tables or keeping it exclusively in the analytic workspace. Analytic workspaces are also an alternative to materialized views for generating and storing aggregate data.

See Also:

"Choosing a Schema for Your Data" for a discussion of the merits of these storage alternatives.

Temporary and Persistent Analytic Workspaces

Data can be loaded into analytic workspaces from SQL tables or from flat files. The analytic workspaces can be either temporary or persistent, depending on your needs. If an analytic workspace is needed only to perform a specific calculation and the results of the calculation does not need to persist in the workspace, the workspace can be discarded at the end of the session. This might occur if, for example, an application needs to forecast a small amount of sales data. Since the forecast can be rerun at any time, there might not be any point in saving the results.

Analytic workspaces can also persist across sessions. You might want to save data in an analytic workspace if you have calculated a significant amount of data (for example, a large forecast or the results of solving a model), or if you have aggregated data using non-additive aggregation methods.

Data in analytic workspaces can be shared by many different users. To share data in an analytic workspace, the workspace must be saved in the database during the period of time it is to be shared.

See Also:

Oracle9i OLAP Developer's Guide to the OLAP DML for detailed instructions on how to create and populate an analytic workspace, and how to manipulate data stored in it.

About Star, Snowflake, Parent-Child, and Multidimensional Schemas

A schema is a collection of database objects. The following types of schemas are characteristic of a relational data warehouse:

Star schema. Consists of one or more fact tables related to one or more dimension tables. The relationships are defined through foreign keys, metadata, or both.
Snowflake schema. A star schema that has been partially or fully normalized to reduce the number of duplicate values in the dimension tables. However, snowflake schema require more joins, which can slow performance.

For example, a star schema might have a single geography dimension table with four columns: city, state, region, and country. Only the city column has predominately unique values, while the other columns have increasing numbers of duplicate values.

A snowflake schema might have three related geography dimension tables: One table with two columns (city and state) that define the relationship between cities and states, a second table with two columns (state and region) that define the relationship between states and regions, and a third table with two columns (region and country) that define the relationship between regions and countries.

Star and snowflake schemas use level-based dimensions. Their hierarchies are defined by the relationship between levels, and their levels map to columns in dimension tables. Alternatively, a data warehouse schema may use parent-child dimensions. In this type of schema, dimension members map to a parent column and a child column. The parent-child combination in a given row expresses a hierarchical relationship.

Your relational tables can be organized in either a level-based schema (star or snowflake) or a parent-child schema.

With Oracle OLAP, your data warehouse storage options are extended to include:

Multidimensional schema. You can think of analytic workspaces as multidimensional schema, since a workspace stores a collection of related objects.

With analytic workspace data, the data warehouse can support multidimensional and hybrid solutions in addition to pure relational storage models. Thus, an Oracle OLAP schema can contain multidimensional analytic workspace objects in addition to fact tables and dimension tables.

Choosing a Schema for Your Data

The types of analyses performed by the OLAP applications that your data warehouse will support determine the best choice of a data repository. You must examine the benefits of each storage method in light of these applications and decide which one most closely matches their requirements. You can choose to store the data for your business analysis applications from these alternatives:

Entirely in a relational schema. During user sessions, SQL commands are used to select and manipulate the data in the relational database.

Fact tables are the preferred data repository for most query and reporting applications that require read-only access to the data. For these applications, the relational database offers scalability in supporting very large data sets efficiently and manageability with a single set of administrative tools.
Entirely in a multidimensional schema. As a routine maintenance task, data is loaded into dimensions and variables in the analytic workspace from one or more sources (including relational tables and flat files). During user sessions, data is selected and manipulated in the analytic workspace.

Analytic workspaces should be used as a persistent data store for applications that support predictive analysis functions, such as models, forecasts, and what-if scenarios. Other design choices -- such as the types of hierarchies, the use of non-additive aggregation methods, or storage issues concerning aggregate data -- may make workspace objects the preferred data repository.
Distributed between a relational schema and a multidimensional schema. The implementation of this model can, of course, vary widely since it encompasses any scheme that draws on the other two methods. A distributed solution may be desirable when an application requires the advanced calculation capabilities of the analytic workspace combined with the efficient storage of standard relational tables.

As explained in "How the OLAP API Accesses Multidimensional Data", the storage location of data is transparent to applications that use OLAP metadata to identify data objects. Thus, database administrators can fine-tune the database by moving data between relational tables and analytic workspaces without breaking existing Java applications that use the OLAP API.