5.4.1.1 Model Definition Maintenance

This topic provides the systematic instructions to maintain the use case details, define the use case type, and data source details.

Specify User ID and Password, and login to Home screen.
  1. On Home screen, click Machine Learning. Under Machine Learning, click Model Definition.
  2. On View Model Definition screen, click iconbutton on the Use case tile to Unlock or click Add button to create the new model definition.
    The Model Definition screen displays.
  3. Specify the fields on Model Definition screen.

    Note:

    The fields marked as Required are mandatory.
    For more information on fields, refer to the field description table.

    Table 5-7 Model Definition – Field Description

    Field Description
    Use Case Name Specify the name of the Use Case.
    Description Specify the description of the Use Case.
    Use Case Type Select the type of Use Case.

    Refer Frameworks Supported for details.

    Product Processor Select the product to which the use case belongs.
    Training Data Source Specify the Table or View name used as data source to train the model.
    Unique Identifier Select the column name to uniquely identify a record.

    Note:

    Column name is a function of table/view design.
    Target Column Select the value of the column which is predicted by training the model.

    Note:

    Column name is a function of table/view design.
    Positive Target Value If Use Case Type selected is CLASSIFICATION, then this field is enabled else disabled for REGRESSION. It will display distinct values from the target column
    Tablespace Specify the valid tablespace and all model related data will be persisted in this table space.
    Inference Data Source Specify the Table or View that capture the data to be used for making predictions.

    Inference data source will be the current data where we are trying to predict the target using the built model, unlike the training data where target is already provided.

    Partition Column Names Specify the column names to slice data.

    Refer Partitioned Model for details.

    Selected Algorithm Select the algorithm from the list and build the model.

    For REGRESSION, this field should be null and allow the framework to select the best fit algorithm to build the model.

    Model Error Statistics Select the model error statistics.

    By Default, the value is selected as RMSE for REGRESSION.

    The user can also select MAE.

    Note:

    It will be disabled for CLASSIFICATION
  4. Click Save to save the details.
    The user can view the configured details in the Model Definition screen.

Cost Matrix:

This button is enabled ONLY for CLASSIFICATION type of use cases.

Any classification model can make two kinds of error

Table 5-8 Classification Type - Error

Actual Value Predicted Value Error Type
1 0 False Negative
0 1 False Positive

This screen is used to bias the model into minimizing one of the error types, by adding a penalty cost.

All penalty cost has to be positive.

Table 5-9 Classification Type - Penalty

Actual Value Predicted Value Penalty Cost
1 0 6
0 1 2

The default is zero cost for all combinations.

Biasing the model is a trade-off with accuracy of prediction. Business determines if a classification model is required to be biased or not.

  1. Click Cost Matrix button to launch the screen.
    The Cost Matrix screen displays.
  2. On Cost Value screen, specify the relevant penalty cost.
  3. Click Save to save and close the Cost Matrix screen and back to the Model Definition screen.

Correlation:Multicollinearity occurs when two or more independent variables are highly correlated with one another in a model.

Multicollinearity may not affect the accuracy of the model as much, but we might lose reliability in model interpretation.

Irrespective of CLASSIFICATION or REGRESSION, all use cases must be evaluated for Correlation.

This button will display Orange mark if evaluation is pending.

  1. Click Correlation button to launch the screen.
    The Correlation Analysis screen displays.

    Figure 5-5 Correlation Analysis



  2. Select the required fields on Correlation Analysis screen.
    For more information on fields, refer to the field description table.

    Table 5-10 Correlation Analysis – Field Description

    Field Description
    Threshold Value Select the threshold value.

    The Value can be set between 0.1 to 0.9.

    Note:

    By default, the value is set as 0.5.
    Type of Correlation Select the type of correlation.

    By default, the option is selected as Pearson.

    The formula used for calculation is different for each type

    Pairwise Correlation Displays the output of the Correlation Validation.
    Analyzed Features Displays the distinct analysed Features from Pairwise Correlation.
    Ignore Features User defined list created from Analysed Features.
  3. Click Refresh to initiate the evaluation process.
    The Correlation Analysis - Pairwise Correlation screen displays.

    Figure 5-6 Correlation Analysis - Pairwise Correlation



  4. Move ONE of the Analyzed Features to Ignore Features List.
  5. Click Refresh and re-evaluate Correlation as mentioned in Step 8.
  6. Rinse and repeat the Step 9 and 10 for each feature addition to the Ignore feature list, until Pairwise Correlation displays zero correlated pair.
  7. Attempting to exit the screen midway without achieving zero Pairwise Correlation, will display the following error message.
    The Error Message screen displays.
  8. After successful Correlation Evaluation, the orange highlight on the Correlation button is removed.
  9. After Correlation Evaluation and Cost Matrix definition (for CLASSIFICATION)
  10. Click Save to create the new Model Definition.
    The user can view the configured details in the View Model Definition screen.

Model Metrices

Once the user has successfully trained Machine Learning model, the user can score/predict the model outcomes as required by the use case. The user can view the Model Metrices screen only after training the model successfully. Refer to Model Training and Scoring section for training the model.

  1. Click Model Metrices to view the Model Metrices details.
    The Model Metrices screen displays. For more information on fields, refer to the field description table.

    Table 5-11 Model Metrices – Field Description

    Field Description
    Model Partitions Select the model partitions from the drop-down list.

    If the model has been designed to have partitions, it will display the partitioned values based on underlying data of the defined partition column else display FULL MODEL.

    Metrices Displays the various model attributes, as per the best model identified and trained. The number of model attributes is a function of algorithm and underlying pattern of data.

    Some attributes are common for all models as below.

    Model Name

    Algorithm

    INF_TIME (Inference Time)

    <Model metric>(Train)

    <Model metric>(Test)

    Value Displays the value of the attribute.