Add Database Analytics to a Data Flow

Database analytics enable you to perform advanced analysis and data mining analysis, for example, detecting anomalies, clustering data, sampling data, and affinity analysis. Database analytics are executed in the database, not in Oracle Analytics, therefore you must be connected to an Oracle database or Oracle Autonomous Data Warehouse.

Use the Database Analytics step in the data flow editor.
Before you start, create a connection to your Oracle database or Oracle Autonomous Data Warehouse and use it to create a data set.
  1. In the data flow editor, click Add a step (+), and select Database Analytics.
    If you aren't connected to an Oracle database or Oracle Autonomous Data Warehouse, you won't see the Database Analytics option.
  2. At the Select Database Analytics page, select the analytics operation you want to use, then click OK.
  3. On the Analytics Operation <type> pane, configure the operation.
    • Use the Inputs or Outputs area to specify the data columns to analyze.
    • Use the Parameters area to configure options for the operation.

      To help you configure the operation, use the on-screen guidance displayed for each parameter.

    Function Types Description
    Dynamic Anomaly Detection Detect anomalies in your input data without a pre-defined model. For example, you might want to highlight unusual financial transactions.

    When you deploy this function with large data sets, configure the partition columns to maximise performance.

    Dynamic Clustering Cluster your input data without a pre-defined model. For example, you might want to characterize and discover customer segments for marketing purposes.

    When you deploy this function with large data sets, configure the partition columns to maximise performance.

    Frequent Item Set Discover relationships in your data by identifying sets of items that often appear together. This data mining technique is also known as association rule learning, affinity analysis, or in the retail industry as market basket analysis. If you use frequent item set as a market basket analysis tool, you might find that customers who buy shampoo also buy hair conditioner.

    This operation is resource intensive and its performance depends on several factors, such as input dataset volume, cardinality of transaction id, and cardinality of Item value column. To avoid potential performance degradation on the database, try with a higher value of minimum support percent (default is 0.25) and gradually reduce it to accommodate more itemsets in your output.

    Sampling Data Selects a random sample percentage of data from a table. You simply specify the percentage of data you want to sample. For example, you might want to randomly sample ten percent of your data.
    Text Tokenization Analyze textual data by breaking it down into distinct words and counting the occurrences of each word. When you run your data flow, Oracle Analytics creates a table in the database named DR$IndexName$I, which contains the token text and the token count related details. Use the DR$IndexName$I table to create a data set.
    • Under Outputs, use the Create option next to each field to select the columns to index.

    • Under Parameters, then Text Column, click Select a column to select the field you'd like to break down into separate words. Use the Reference Column<number> options to include one or more columns in the output data set.

    The database connection that you use for your data flow requires special database privileges. Check with your administrator that:

    • Your database account has grant EXECUTE on CTXSYS.CTX_DDL to schema name.
    • You use an Oracle Analytics connection with the same username as the schema where the source table exists. This is best practice to avoid access privilege issues when the data flow runs.
    • The database table column you're analyzing has no existing CONTEXT index. If there's an existing CONTEXT index on the database table that you're analyzing, remove that index before you run the text tokenization data flow.
    Un-pivoting Data Transpose data that's stored in columns into row format. For example, you might want to transpose multiple columns showing a revenue metric value for each year to a single revenue column with multiple value rows for the year dimension. You simply select the metric columns to transpose and specify a name for the new column.You'll get a new dataset with fewer columns and more rows.