Before You Begin

This 15-minute tutorial shows you how to create a random dataset, train a predictive model, create a live scenario, and use the datasets and scenarios in visualizations.

Background

In Oracle Analytics, predictive models use several embedded machine learning algorithms to mine your datasets, predict a target values, or identify classes of records.

Oracle's machine learning functionality is for advanced data analysts who have an idea of what they're looking for in their data, are familiar with the practice of predictive analytics, and understand the differences between algorithms.

This is the first tutorial in Train and Apply Predictive Models in Oracle Analytics. Read the tutorials in the order listed.

What Do You Need?

  • Access to Oracle Analytics Cloud or Oracle Analytics Desktop

    When using Oracle Analytics Desktop, you must install machine learning (DVML) to use Diagnostics Analytics (Explain), Machine Learning Studio, or advanced analytics.

  • Download donation.xlsx to your computer

Create a Dataset

In this section, you create a Dataset using the donation file. When Oracle Analytics loads numerical data, it's treated as a measure. You learn how to add a custom date format.

  1. Sign in to Oracle Analytics.
  2. On the Home page, click Create, and then click Dataset.
  3. In Create Dataset, click Drop data file here or click to browse, select the donation.xlsx file, and then click Open.
  4. In Create Dataset Table from donation.xlsx, click OK.
  5. Click the donation tab.
  6. Right-click the DATE_POSTED column and select Convert to Date.
  7. In Convert to Date, select Custom from the Source Format list, and then enter dd.MMM.yyyy as the date format. Click Add Step.
  8. Click Save Save icon. In Save Dataset As, enter donation in Name, and then click OK.

Visualize the Data

In this section, you create visualizations with the donation dataset as a baseline to compare with the workbook that uses a random set of the donation data.

  1. Click Create Workbook. Close the Auto Insights panel.
  2. Click the Canvas 1 menu canvas menu icon. Select Canvas Properties.
  3. In Canvas Properties, click Auto Fit in the Layout row, select Freeform, and then click OK.
  4. Right-click My Calculations and select Create Calculation.
  5. In New Calculation, enter Number of Projects in Name. For the expression, enter Count and select Count. In the column placeholder, enter Proj and select PROJECTID. Click Validate and click Apply.
  6. Drag Number of Projects to the canvas.


    The tile visualization shows the number of projects in the dataset.

    Description of num_projects.png follows
    Description of the illustration num_projects.png
  7. In the Data pane, hold down the Ctrl key, select TOTAL_DONATIONS and DATE_POSTED. Right-click and select Create the Best Visualization.
  8. In the Grammar panel, right-click DATE_POSTED, select Show By, and then select Quarter.
  9. In the Data pane, drag TOTAL_DONATIONS to the canvas.


    Description of 3_vizs.png follows
    Description of the illustration 3_vizs.png
  10. Click Save. In Save Workbook, enter Donations_Workbook in Name, and then click Save Save icon. Click Go back Go back icon.

Create a Random Dataset

  1. On the Home page, click Create, and then click Data Flow.
  2. In Add Data, select the donation dataset, and then click Add.
  3. In Data Flow Steps, double-click Filter. In Filter, click Add Filter Add Filter icon. From Available Data, select DATE_POSTED. In Range values, enter 6/01/2012 in the first calendar text box. Enter 5/31/2014 in the second calendar text box. Click outside the dialog.


    Description of dataset_range_dates.png follows
    Description of the illustration dataset_range_dates.png
  4. Click the donation node. Double-click Add Columns.
  5. In Add Columns, enter Random Filter in Name, and then enter RAND() in the Expression field. Click Validate, and then click Apply.
  6. Click the Filter node and click Add Filter Add Filter icon next to the DATE_POSTED filter. From Available Data, click Random Filter. Click the End field, enter .15 to select a maximum of 15% of the sample data, and then click End.


    Description of filter_dataset.png follows
    Description of the illustration filter_dataset.png
  7. Click Add a step Add a step icon on the Filter node, and then click Select Columns. In Select Columns, select Random Filter from the Selected Columns list, and then click Remove selected.


    You don't need the Random Filter column in the dataset.

    Description of remove_random_filter.png follows
    Description of the illustration remove_random_filter.png
  8. From Data Flow Steps, drag Save Dataset to the Select Columns node. In Save Dataset, enter sample_donation_data.
  9. In Save Dataset under Columns, click Sum in the PROJECTID row, and then select Count from Default Aggregation list.
  10. In the SCH_LATITUDE and SCH_LONGITUDE columns, select Measure in the Treat As column and click Attribute.


    Description of sample_donation_data_df.png follows
    Description of the illustration sample_donation_data_df.png
  11. Click Save. In Save Data Flow As, enter sample_donations_data_df, and then click OK.
  12. Click Run Data Flow Run Data Flow icon to create the sample dataset.

Examine the Sample Donations Dataset

  1. Click Go back Back icon. On the Home page, select the sample_donation_data dataset, click the Actions Actions menu icon, and then select Create Workbook.
  2. Click the Canvas 1 menu canvas menu icon. Select Canvas Properties.
  3. In Canvas Properties, click Auto Fit in the Layout row, select Freeform, and then click OK.
  4. Drag PROJECTID to the canvas.


    Because the sample data is a random selection of records from the dataset, your PROJECTID visualization might not match the results in this visualization.

    Description of sample_data_projectid_viz.png follows
    Description of the illustration sample_data_projectid_viz.png
  5. In the Data panel, hold down the Ctrl key, select TOTAL_DONATIONS and DATE_POSTED. Right-click and then select Create the Best Visualization.
  6. In the Grammar panel, right-click DATE_POSTED, select Show By, and then select Quarter.
  7. In the Data panel, drag TOTAL_DONATIONS to the canvas.


    Description of sample_data_total_donations_date.png follows
    Description of the illustration sample_data_total_donations_date.png
  8. Click Save. In Save Workbook, enter donations_random_sample, and then click Save. Click Go back Back icon to return to the Home page.

Create a Training Model

  1. On the Home page, click Create, and then click Data Flow.
  2. In Add Data, select the sample_donation_data dataset, and then click Add.
  3. From Data Flow Steps, double-click Train Numeric Prediction.
  4. In Select Train Numeric Prediction Model Script, select Elastic Net Linear Regression for model training, and then click OK.
  5. In Train Numeric Prediction, click Select a column. From Available data, select TOTAL_DONATIONS as the Target.
  6. Click the Save Model node in the data flow. Enter elastic_model_1 in Model name.


    Description of train_model_dataflow.png follows
    Description of the illustration train_model_dataflow.png
  7. Click Save. In Save Data Flow As, enter elastic_train_df in Name, and then click OK.
  8. Click Run Data Flow.
  9. In the message "Data Flow elastic_train_df complete", click Go back  Back icon to return to the Home page.
  10. On the Home page, click Machine Learning to view the elastic_model_1 output. Click the Actions Actions menu icon, and then select Inspect.


    Description of elastic_model_1_quality.png follows
    Description of the illustration elastic_model_1_quality.png

Apply the Train Model to a Workbook

In this section, you add the predicted value for total donations to the Total Donations by Date Posted (Month) visualization to view the results of using the elastic model.

  1. Click Workbooks and Reports.
  2. On the Home page, search for your donations_random_sample workbook.
  3. In the donations_random_sample workbook, click the Actions Actions menu icon, and then select Open Click Edit Edit icon.
  4. Click the PROJECTID visualization, click Menu visualization menu icon, and then select Delete Visualization.
  5. Click the TOTAL_DONATIONS visualization, right-click and select Delete Visualization.
  6. In the Data panel, click Add Add icon, and then click Create Scenario.
  7. In Create Scenario - Select Model, select elastic_model_1, and then click OK.


    Description of scenario_in_data.png follows
    Description of the illustration scenario_in_data.png
  8. Click the visualization. In the Data panel, expand elastic_model_1, select TOTAL_DONATIONS Prediction, and then drag it to Values (Y-Axis) in the Grammar panel.


    The green line represents the actual donations data by date posted. The orange line represents the predicted donations.

    Description of total_donat_prediction.png follows
    Description of the illustration total_donat_prediction.png
  9. In the Data panel, select SCH_METRO, drag it to Trellis Columns in the Grammar panel.


    The visualization shows the donations data divided into school metro groups: rural, suburban, and urban.

    Description of donations_by_sch_metro.png follows
    Description of the illustration donations_by_sch_metro.png
  10. In the Grammar panel, click the X in SCH_METRO to remove it from the visualization. Click Save.

Next Steps

Inspect and Modify the Prediction Model

Learn More