Creating a Model Using R Scripted Technique

You can create models using a technique that you have defined from the Technique Registration window. Execution Status Log is not displayed in the Model Definition window for models created using the Standard R Engine.

To create a model using the R scripted technique, follow these steps:

Select the Add icon from the Model Management toolbar. The Model Definition window is displayed. The Add button is disabled if you have selected any Model ID in the grid.

Figure 9-3 Model Definition page

Enter the Model Definition Details. The common fields are described in the following table. The grid below the Model Details section displays the various tabs available for the selected technique. To update the required information, see the following sections:

Table 9-1 Model Definition - Field and its Description

Field	Description
Model Name	Specify a model name for the model definition. Model Name is case sensitive and does not allow duplication. For example, the model name "Linear Regression" is not allowed if a model with the name "linear regression" exists. Ensure that there are no special characters like `, {,},", ', ~, <,>, /, \, and multiple spaces.
Model Description	Enter a description of the model.
Do you like to script the model?	Select the checkbox to script the model in the Model Script pane.
Model Objective	Select the Model Objective from the drop-down list. You can also click the Add icon to create a Model Objective.
Technique	This field is disabled if you have selected to script the model. 1. Click and open the Technique Selection window. The pre-packaged techniques and user-defined (registered and authorized) R techniques are listed in the Techniques pane. Click the + icon and expand the technique heading groups. Select the required technique and click the Forward Arrow icon. Click OK. The selected Technique details are displayed in the Model Definition window. If you have selected the R technique, click the Book icon to view the script.
Dataset	By default, the dataset of the Sandbox is displayed. You can change the dataset if necessary. Dataset selection is mandatory: For models based on R scripted techniques if variables are declared in the R script. Click to open the Dataset Selection window. The available datasets are listed in the Datasets pane. Select a dataset and click to view the details of the selected datasets. Click to create a new dataset. For more details on creating a dataset, refer to the Creating Data Set section in Oracle Financial Services Analytical Applications Infrastructure User Guide. You can create a dataset using any of the tables which are part of the production information domain. But if you create a dataset with a table that is not part of the Sandbox and create a model using that dataset, then deploy the model to production Infodom and execute it there. Select the required Dataset based on which the model is to be created and click the Forward Arrow icon. Ensure the selected dataset is loaded with data, otherwise model execution will fail. You can select multiple datasets for models executed using Standard R Engine. If multiple datasets are used, you should use at least a variable from each dataset. Click OK. Note: The Datasets based on Derived Entities are not supported.
Language	This field is not displayed for techniques based models. Select the scripting language from the drop-down list. The options are: R Input Data Type
Type	This field is not displayed for techniques based models. Select the type of engine from the drop-down list. The options are: Standard R Engine ORE Engine - This option is not displayed in Hive-based Infodoms.
Input Data Type	This field is displayed only in Hive-based Infodoms for models based on R scripted techniques or if you select to script the model. Select the input data type. The options are ORE Frame, Data Frame, and HDFS File.

Fields marked in red asterisk (*) are mandatory.

Click Save to save the model definition details, after all the necessary details are updated.
Click Preview Data to view the data of the selected dataset. It displays the primary keys and the attributes/ columns of the tables in the selected dataset. Only those columns which are mapped to the variables in the script are displayed.

Note:
In the case of the Hive-based Sandbox information domain, previewing data takes a long time and only 100 records are displayed.
5. Click Execute.
The Execution Status grid displays the model execution log dynamically.

Note:
For R-based models, the execution may fail if the dataset contains internal joins. Executing a model using the standard R engine with the new Cloudera jars is failing with model queries exceeding a certain limit. The workaround is to append UseNativeQuery=1 in the JDBC URL of the Hive schemas in which model definitions and executions happen. For example, jdbc:hive2:192.168.1.0:1000/default;useNativeQuery=1