Apply a Predictive Model to a Data Set

You can use the data flow editor to score a predictive model on any data set. The predictive model outputs a new data set with columns containing predicted values that can be used for analysis and visualization.

  1. In the Home page, click Create and select Data Flow.
    The Add Data Set pane is displayed.
  2. Select the data set that you want to apply the model to. Click Add.
  3. In the data flow editor, click Add a step (+).
  4. Navigate to the bottom of the list and click Apply Model.
  5. In the Select Model dialog, select the model and click OK.
  6. Go to the Outputs section and inspect the columns returned by the model. Select the columns that you want outputted with the data set, and update the Column Name fields as needed.
    Output columns vary depending on the model type. For example, for numeric prediction, output columns include PredictedValue and PredictedConfidence. And for clustering, output column are the clusterId.
  7. Go to the Inputs section and inspect how the columns in the scoring data set were matched to the columns in the model. Adjust the column matching as needed. The parameters section displays parameters specific to the model type. For example, if you use a clustering model for scoring, maximum null values present is a parameter that you can provide for the scoring process. This parameter is used in the missing value imputation.
  8. In the data flow editor, click Add a step (+) and select Save Data.
  9. Enter a name in the Name field. Set data preferences as needed in the Treat As and Default Aggregation fields.
    When you save data, the apply model appends the model's output columns that you selected to the data set.
  10. Click Save, enter a name and description, and click OK to save the data flow.
  11. Click Run Data Flow to create the data set that you can use for visualizations in your projects.