Database Analytics Functions

Database analytics functions enable you to perform advanced analysis and data mining analysis, for example, detecting anomalies, clustering data, sampling data, and affinity analysis. Analytics Functions are available when you connect to an Oracle database or Oracle Autonomous Data Warehouse.

Tutorial icon LiveLabs Sprint

To display the Database Analytics step in the data flow editor, you must connect to an Oracle database or Oracle Autonomous Data Warehouse.

Function Types Description

Dynamic Anomaly Detection

Detect anomalies in your input data without a pre-defined model. For example, you might want to highlight unusual financial transactions.

When you deploy this function with large datasets, configure the partition columns to maximize performance.

Dynamic Clustering

Cluster your input data without a pre-defined model. For example, you might want to characterize and discover customer segments for marketing purposes.

When you deploy this function with large datasets, configure the partition columns to maximize performance.

Frequent Item Set

Discover relationships in your data by identifying sets of items that often appear together. This data mining technique is also known as association rule learning, affinity analysis, or in the retail industry as market basket analysis. If you use frequent item set as a market basket analysis tool, you might find that customers who buy shampoo also buy hair conditioner.

This operation is resource intensive and its performance depends on several factors, such as input dataset volume, cardinality of transaction id, and cardinality of Item value column. To avoid potential performance degradation on the database, try with a higher value of minimum support percent (default is 0.25) and gradually reduce it to accommodate more itemsets in your output.

Sampling Data

Selects a random sample percentage of data from a table. You simply specify the percentage of data you want to sample. For example, you might want to randomly sample ten percent of your data.

Text Tokenization

Analyze textual data by breaking it down into distinct words and counting the occurrences of each word. When you run your data flow, Oracle Analytics creates a table in the database named DR$IndexName$I, which contains the token text and the token count related details. Use the DR$IndexName$I table to create a dataset.

  • Under Outputs, use the Create option next to each field to select the columns to index.

  • Under Parameters, then Text Column, click Select a column to select the field you'd like to break down into separate words. Use the Reference Column<number> options to include one or more columns in the output dataset.

The database connection that you use for your data flow requires special database privileges. Check with your administrator that:
  • Your database account has grant EXECUTE on CTXSYS.CTX_DDL to schema name.
  • You use an Oracle Analytics connection with the same username as the schema where the source table exists. This is best practice to avoid access privilege issues when the data flow runs.
  • The database table column you're analyzing has no existing CONTEXT index. If there's an existing CONTEXT index on the database table that you're analyzing, remove that index before you run the text tokenization data flow.

Time Series

Time Series is a data mining technique that forecasts target value based on a known history of target values. The input to time series analysis is a sequence of target values. It provides estimates of the target value for each period of a time window that can include up to 30 periods beyond the historical data.

The model also computes various statistics that measure the goodness of fit to historical data. These statistics are available as an additional output dataset via a parameter setting.

Note: The Time Series algorithm is only available from Oracle database version 18c onwards.

Un-pivoting Data

Transpose data that's stored in columns into row format. For example, you might want to transpose multiple columns showing a revenue metric value for each year to a single revenue column with multiple value rows for the year dimension. You simply select the metric columns to transpose and specify a name for the new column.You get a new dataset with fewer columns and more rows.

Note: To use analytic functions, make sure that the administrator has enabled analytics functions (via Console, System Systems, Performance and Compatibility, Enable Database Analytics Node in Data Flows).