|Oracle9i Data Warehousing Guide
Release 2 (9.2)
Part Number A96520-01
In large data warehouse environments, many different types of analysis can occur. In addition to SQL queries, you may also apply more advanced analytical operations to your data. Two major types of such analysis are OLAP (On-Line Analytic Processing) and data mining. Rather than having a separate OLAP or data mining engine, Oracle has integrated OLAP and data mining capabilities directly into the database server. Oracle OLAP and Oracle Data Mining are options to the Oracle9i Database. This chapter provides a brief introduction to these technologies, and more detail can be found in these products' respective documentation.
The following topics provide an introduction to Oracle's OLAP and data mining capabilities:
Oracle9i OLAP User's Guide for further information regarding OLAP and Oracle Data Mining documentation for further information regarding data mining
Oracle9i OLAP adds the query performance and calculation capability previously found only in multidimensional databases to Oracle's relational platform. In addition, it provides a Java OLAP API that is appropriate for the development of internet-ready analytical applications. Unlike other combinations of OLAP and RDBMS technology, Oracle9i OLAP is not a multidimensional database using bridges to move data from the relational data store to a multidimensional data store. Instead, it is truly an OLAP-enabled relational database. As a result, Oracle9i provides the benefits of a multidimensional database along with the scalability, accessibility, security, manageability, and high availability of the Oracle9i database. The Java OLAP API, which is specifically designed for internet-based analytical applications, offers productive data access.
Oracle9i OLAP User's Guide for further information regarding OLAP
Basing an OLAP system directly on the Oracle server offers the following benefits:
Oracle9i OLAP is highly scalable. In today's environment, there is tremendous growth along three dimensions of analytic applications: number of users, size of data, complexity of analyses. There are more users of analytical applications, and they need access to more data to perform more sophisticated analysis and target marketing. For example, a telephone company might want a customer dimension to include detail such as all telephone numbers as part of an application that is used to analyze customer turnover. This would require support for multi-million row dimension tables and very large volumes of fact data. Oracle9i can handle very large data sets using parallel execution and partitioning, as well as offering support for advanced hardware and clustering.
Oracle9i includes many features that support high availability. One of the most significant is partitioning, which allows management of precise subsets of tables and indexes, so that management operations affect only small pieces of these data structures. By partitioning tables and indexes, data management processing time is reduced, thus minimizing the time data is unavailable. Another feature supporting high availability is transportable tablespaces. With transportable tablespaces, large data sets, including tables and indexes, can be added with almost no processing to other databases. This enables extremely rapid data loading and updates.
Oracle enables you to precisely control resource utilization. The Database Resource Manager, for example, provides a mechanism for allocating the resources of a data warehouse among different sets of end-users. Consider an environment where the marketing department and the sales department share an OLAP system. Using the Database Resource Manager, you could specify that the marketing department receive at least 60 percent of the CPU resources of the machines, while the sales department receive 40 percent of the CPU resources. You can also further specify limits on the total number of active sessions, and the degree of parallelism of individual queries for each department.
Another resource management facility is the progress monitor, which gives end users and administrators the status of long-running operations. Oracle9i maintains statistics describing the percent-complete of these operations. Oracle Enterprise Manager enables you to view a bar-graph display of these operations showing what percent complete they are. Moreover, any other tool or any database administrator can also retrieve progress information directly from the Oracle data server, using system views.
Oracle provides a server-managed infrastructure for backup, restore, and recovery tasks that enables simpler, safer operations at terabyte scale. Some of the highlights are:
Oracle9i Recovery Manager User's Guide for further details
Just as the demands of real-world transaction processing required Oracle to develop robust features for scalability, manageability and backup and recovery, they lead Oracle to create industry-leading security features. The security features in Oracle have reached the highest levels of U.S. government certification for database trustworthiness. Oracle's fine grained access control feature, enables cell-level security for OLAP users. Fine grained access control works with minimal burden on query processing, and it enables efficient centralized security management.
Oracle enables data mining inside the database for performance and scalability. Some of the capabilities are:
Oracle9i Data Mining provides a Java API to exploit the data mining functionality that is embedded within the Oracle9i database.
By delivering complete programmatic control of the database in data mining, Oracle Data Mining (ODM) delivers powerful, scalable modeling and real-time scoring. This enables e-businesses to incorporate predictions and classifications in all processes and decision points throughout the business cycle.
ODM is designed to meet the challenges of vast amounts of data, delivering accurate insights completely integrated into e-business applications. This integrated intelligence enables the automation and decision speed that e-businesses require in order to compete today.
Oracle Data Mining uses data mining algorithms to sift through the large volumes of data generated by e-businesses to produce, evaluate, and deploy predictive models. It also enriches mission critical applications in CRM, manufacturing control, inventory management, customer service and support, Web portals, wireless devices and other fields with context-specific recommendations and predictive monitoring of critical processes. ODM delivers real-time answers to questions such as:
Oracle Data Mining performs all the phases of data mining within the database. In each data mining phase, this architecture results in significant improvements including performance, automation, and integration.
Data preparation can create new tables or views of existing data. Both options perform faster than moving data to an external data mining utility and offer the programmer the option of snap-shots or real-time updates.
Oracle Data Mining provides utilities for complex, data mining-specific tasks. Binning improves model build time and model performance, so ODM provides a utility for user-defined binning. ODM accepts data in either single record format or in transactional format and performs mining on transactional formats. Single record format is most common in applications, so ODM provides a utility for transforming single record format.
Associated analysis for preparatory data exploration and model evaluation is extended by Oracle's statistical functions and OLAP capabilities. Because these also operate within the database, they can all be incorporated into a seamless application that shares database objects. This allows for more functional and faster applications.
Oracle Data Mining provides four algorithms: Naïve Bayes, Decision Tree, Clustering, and Association Rules. These algorithms address a broad spectrum of business problems, ranging from predicting the future likelihood of a customer purchasing a given product, to understand which products are likely be purchased together in a single trip to the grocery store. All model building takes place inside the database. Once again, the data does not need to move outside the database in order to build the model, and therefore the entire data-mining process is accelerated.
Models are stored in the database and directly accessible for evaluation, reporting, and further analysis by a wide variety of tools and application functions. ODM provides APIs for calculating traditional confusion matrixes and lift charts. It stores the models, the underlying data, and these analysis results together in the database to allow further analysis, reporting and application specific model management.
Oracle Data Mining provides both batch and real-time scoring. In batch mode, ODM takes a table as input. It scores every record, and returns a scored table as a result. In real-time mode, parameters for a single record are passed in and the scores are returned in a Java object.
In both modes, ODM can deliver a variety of scores. It can return a rating or probability of a specific outcome. Alternatively it can return a predicted outcome and the probability of that outcome occurring. Some examples follow.
The Oracle Data Mining API lets you build analytical models and deliver real-time predictions in any application that supports Java. The API is based on the emerging JSR-073 standard.