|Oracle® Database Data Warehousing Guide
10g Release 2 (10.2)
|PDF · Mobi · ePub|
In large data warehouse environments, many different types of analysis can occur. In addition to SQL queries, you may also apply more advanced analytical operations to your data. Two major types of such analysis are OLAP (On-Line Analytic Processing) and data mining. Rather than having a separate OLAP or data mining engine, Oracle has integrated OLAP and data mining capabilities directly into the database server. Oracle OLAP and Oracle Data Mining (ODM) are options to the Oracle Database. This chapter provides a brief introduction to these technologies, and more detail can be found in these products' respective documentation.
The following topics provide an introduction to Oracle's OLAP and data mining capabilities:
OLAP summarizes data and makes forecasts. For example, OLAP answers questions like "What are the average sales of mutual funds, by region and by year?"
Data mining discovers hidden patterns in data. Data mining operates at a detail level instead of a summary level. Data mining answers questions like "Who is likely to buy a mutual fund in the next six months, and what are the characteristics of these likely buyers?"
OLAP and data mining can complement each other. For example, OLAP might pinpoint problems with sales of mutual funds in a certain region. Data mining could then be used to gain insight about the behavior of individual customers in the region. Finally, after data mining predicts something like a 5% increase in sales, OLAP can be used to track the net income.
Oracle OLAP uses a multidimensional data model to perform complex statistical, mathematical, and financial analysis of historical data. Oracle OLAP provides a suite of tools for database administration and application development. OLAP functionality is also incorporated in Oracle query and reporting tools.
For more information regarding OLAP, see the following documentation:
Basing an OLAP system directly on the Oracle server offers the following benefits:
Oracle Database OLAP is highly scalable. In today's environment, there is tremendous growth along three dimensions of analytic applications: number of users, size of data, complexity of analyses. There are more users of analytical applications, and they need access to more data to perform more sophisticated analysis and target marketing. For example, a telephone company might want a customer dimension to include detail such as all telephone numbers as part of an application that is used to analyze customer turnover. This would require support for multi-million row dimension tables and very large volumes of fact data. Oracle Database can handle very large data sets using parallel execution and partitioning, as well as offering support for advanced hardware and clustering.
Oracle Database includes many features that support high availability. One of the most significant is partitioning, which allows management of precise subsets of tables and indexes, so that management operations affect only small pieces of these data structures. By partitioning tables and indexes, data management processing time is reduced, thus minimizing the time data is unavailable. Another feature supporting high availability is transportable tablespaces. With transportable tablespaces, large data sets, including tables and indexes, can be added with almost no processing to other databases. This enables extremely rapid data loading and updates.
Oracle enables you to precisely control resource utilization. The Database Resource Manager, for example, provides a mechanism for allocating the resources of a data warehouse among different sets of end-users. Consider an environment where the marketing department and the sales department share an OLAP system. Using the Database Resource Manager, you could specify that the marketing department receive at least 60 percent of the CPU resources of the machines, while the sales department receive 40 percent of the CPU resources. You can also further specify limits on the total number of active sessions, and the degree of parallelism of individual queries for each department.
Another resource management facility is the progress monitor, which gives end users and administrators the status of long-running operations. Oracle Database 10g maintains statistics describing the percent-complete of these operations. Oracle Enterprise Manager enables you to view a bar-graph display of these operations showing what percent complete they are. Moreover, any other tool or any database administrator can also retrieve progress information directly from the Oracle data server, using system views.
Oracle provides a server-managed infrastructure for backup, restore, and recovery tasks that enables simpler, safer operations at terabyte scale. Some of the highlights are:
Details related to backup, restore, and recovery operations are maintained by the server in a recovery catalog and automatically used as part of these operations. This reduces administrative burden and minimizes the possibility of human errors.
Backup and recovery operations are fully integrated with partitioning. Individual partitions, when placed in their own tablespaces, can be backed up and restored independently of the other partitions of a table.
Oracle includes support for incremental backup and recovery using Recovery Manager, enabling operations to be completed efficiently within times proportional to the amount of changes, rather than the overall size of the database.
The backup and recovery technology is highly scalable, and provides tight interfaces to industry-leading media management subsystems. This provides for efficient operations that can scale up to handle very large volumes of data. Open platforms for more hardware options and enterprise-level platforms.
Just as the demands of real-world transaction processing required Oracle to develop robust features for scalability, manageability and backup and recovery, they lead Oracle to create industry-leading security features. The security features in Oracle have reached the highest levels of U.S. government certification for database trustworthiness. Oracle's fine grained access control feature, enables cell-level security for OLAP users. Fine grained access control works with minimal burden on query processing, and it enables efficient centralized security management.
The OLAP option of the Oracle Database includes the following components:
The OLAP analytic engine, which supports the selection and rapid calculation of multidimensional data within the Oracle Database.
Analytic workspaces, which store data in a multidimensional format where it can be manipulated by the OLAP engine.
Analytic Workspace Manager, a graphical user interface for creating and maintaining analytic workspaces.
OLAP Worksheet, an interactive environment for executing OLAP DML, the data definition and manipulation language for interacting with analytic workspaces.
Interfaces for developing OLAP applications in SQL and Java.
OLAP Catalog, the metadata repository which represents a star schema as a logical cube. The OLAP Catalog enables OLAP applications to access relational data.
OracleBI Discoverer Plus OLAP is a full featured tool for business analysis that provides a variety of presentation options including charts and graphs. With Discoverer Plus OLAP you can create queries, drill, pivot, slice and dice data, add analytic calculations, chart the data, and export reports in various data formats.
OracleBI Spreadsheet Add-In combines Oracle Database analytics with the capabilities of Microsoft Excel. You can use the add-in to perform OLAP operations such as drilling, rotation, and data selection within a familiar spreadsheet environment.
The SQL interface to Oracle OLAP is implemented as PL/SQL packages and SQL functions. The PL/SQL packages include
DBMS_AW, which executes OLAP DML, and
DBMS_ODM, which creates materialized views for OLAP applications that use a relational data source. The SQL functions include
OLAP_TABLE function provides the fundamental mechanism for returning multidimensional data, stored in an analytic workspace, as a logical table.
The Java interface consists of the OLAP API, which provides Java classes for querying and analyzing OLAP data, and the Analytic Workspace API, which provides Java classes for creating and maintaining analytic workspaces. The Analytic Workspace API also supports XML representation of a logical multidimensional data model, which can be instantiated in the Database as an analytic workspace.
The Oracle Business Intelligence Beans simplifies the development of OLAP query applications by providing the functions of the OLAP API as JavaBeans. The Business Intelligence Beans are fully integrated with Oracle JDeveloper. They include presentation beans, data beans, and persistence services.
A number of database administration tasks are involved in supporting the OLAP option in the database. One of the primary tasks is the management of analytic workspaces. You can use either of the following tools:
Analytic Workspace Manager provides a user interface for extracting data from a relational schema and creating an analytic workspace in database standard form. This form enables the analytic workspace to be used with various tools for modifying the logical model, loading new data, aggregating the data, and making the data accessible to OLAP applications.
Oracle Warehouse Builder can extract data from many different sources, transform it into a relational schema, and create a standard form analytic workspace. To further define the contents of the workspace, you can use Analytic Workspace Manager.
Oracle Data Mining (ODM) uses data mining algorithms to sift through the large volumes of data generated by businesses to produce, evaluate, and deploy predictive and descriptive models. It enriches mission critical applications in CRM, manufacturing, inventory management, customer service and support, Web portals, wireless devices, and other fields with context-specific recommendations and predictive monitoring of critical processes.
ODM finds valuable information that can help users better understand customers or clients and anticipate customer behavior. ODM insights can be revealing, significant, and valuable. For example, ODM can be used to
Predict those customers likely to change service providers
Discover the factors involved with a disease
Identify fraudulent behavior
Oracle Data Mining provides in-database mining that does not require data movement between the database and an external mining server, thereby eliminating redundancy, improving efficient data storage and processing, and maintaining data security. Mining in the database makes it easier to mine up-to-date data.
ODM supports major mining functions: Classification, Regression, Clustering, Association, Attribute Importance, and Feature Extraction. There are several algorithms for each mining function.
Oracle Data Mining supports the following data mining functions:
Supervised data mining:
Classification: Grouping items into discrete classes and predicting which class an item belongs to; classification algorithms are Decision Tree, Naive Bayes, Adaptive Bayes Network, and Support Vector Machine
Regression: Approximating and forecasting continuous numerical values; the algorithm for regression is Support Vector Machine
Attribute Importance: Identifying the attributes that are most important in predicting results; the algorithm for attribute importance models is Minimum Descriptor Length
Unsupervised data mining:
Clustering: Finding natural groupings in the data; the algorithms for clustering are k-Means and O-Cluster
Associations: Analyzing "market baskets", items that are likely to be purchased together; the algorithm for associations is A Priori
Feature extraction: Creating new attributes (features) as a combination of the original attributes; the algorithm for Feature Extraction is Non-Negative Matrix Factorization
In addition to mining structured data, ODM permits mining of one or more columns of text data.
Oracle Data Mining also supports specialized sequence search and alignment algorithms (BLAST) used to detect similarities between nucleotide and amino acid sequences.
ODM provides extensive support for building applications that automate the extraction and dissemination of data mining insights. There are two kinds of interfaces:
Programmatic interfaces for data mining
Graphical user interface for data mining
ODM programmatic interfaces include
Data mining functions in Oracle SQL for high performance scoring of data
PL/SQL interface for model creation, description, analysis, and deployment
Java interface based on the Java Data Mining standard for model creation, description, analysis, and deployment
DBMS_PREDICTIVE_ANALYTICS PL/SQL package that automates classification and attribute importance
Oracle Data Miner is a GUI that guides users through model creation, description, and analysis. It also supports many data preparation transformations. Models built using Oracle Data Miner can be exported to Java code, which can be used to deploy models. Oracle Data Miner is available as a download from Oracle Technology Network.
ODM's data mining facilities can be used to solve problems in the life sciences. For example, data mining can be used to discover gene and protein targets and to identify leads for new drugs. Also, ODM supports sequence similarity search and annotation using BLAST.
For more information about ODM, see the following documentation:
The "Data Mining Functions" section in the Oracle SQL Reference
DBMS_PREDICTIVE_ANALYTICS in Oracle Database PL/SQL Packages and Types Reference
Oracle Data Mining Java API Reference (Javadoc)