Example: Batch Scoring using a Clustering Model

This is an example of a batch scoring job submission for a K-Means clustering model. OML Services supports batch scoring for regression, classification, clustering, and feature extraction models.

Prerequisites

You require the model ID of the model to be used for scoring. You can get the model ID by sending a GET request to the deployment endpoint and specifying the model URI. The model URI is provided by the user when deploying the model using the AutoML UI or when deploying the model through a REST client.

Batch scoring Workflow

This is the workflow of batch scoring of a clustering model through the OML Services REST interface:
  1. Deploy the model through AutoML UI
  2. Authenticate the database user and obtain the access token
  3. Get the Model ID of the model to be used for scoring
  4. Create a batch scoring job
  5. View the details and output of the batch scoring job
  6. Update, disable, and delete a batch scoring job

Note:

To run REST calls, use cURL in a Linux or Mac terminal or other REST client such as Postman.

1. Deploy a Model

Before monitoring any machine learning model, you must deploy the model.
  1. For this, go to AutoML UI and Create an AutoML experiment if you opt for the automated way to build machine learning models.
  2. Deploy a model

2: Authenticate the Database User and Obtain the Access Token

You must obtain an authentication token by using your Oracle Machine Learning (OML) account credentials to send requests to OML Services. To authenticate and obtain a token, use cURL with the -d option to pass the credentials for your Oracle Machine Learning account against the Oracle Machine Learning user management cloud service REST endpoint /oauth2/v1/token. Run the following command to obtain the access token:

$ curl -X POST --header 'Content-Type: application/json' \
       --header 'Accept: application/json' \
       -d '{"grant_type":"password", "username":"'<yourusername>'","password":"'<yourpassword>'"}' 
        "<oml-cloud-service-location-url>/omlusers/api/oauth2/v1/token"
Here,
  • -X POST specifies to use a POST request when communicating with the HTTP server
  • -header defines the headers required for the request (application/json)
  • -d sends the username and password authentication credentials as data in a POST request to the HTTP server
  • Content-Type defines the response format (JSON)
  • Accept defines the response format (JSON)
  • yourusername is the user name of a Oracle Machine Learning user with the default OML_DEVELOPER role
  • yourpassword is the password for the user name
  • oml-cloud-service-location-url is a URL containing the REST server portion of the Oracle Machine Learning User Management Cloud Service instance URL that includes the tenancy ID and database name. You can obtain the URL from the Development tab in the Service Console of your Oracle Autonomous Database instance.

3: Get the Model ID

To get the modelId, send a GET request to the deployment endpoint and specify the model URI.

Note:

The model URI is provided by the user when deploying the model using the AutoML UI or when deploying the model through a REST client.

GET request to obtain the modelId:

$ curl -X GET "<oml-cloud-service-location-url>/omlmod/v1/deployment/KM_CLUS_MOD" \
       --header "Authorization: Bearer ${token}" | jq '.modelId'

In this example, the model URI is KM_CLUS_MOD.

The GET request returns the following:

returns: "4632a963-4340-4ca5-ba30-aff77a4b857a"

4: Create and Submit a Batch Scoring Job

After obtaining the access token, you can send a POST request to OML Services to create a batch scoring job.

Batch scoring jobs are initiated by sending a POST request to the /omlmod/v1/jobs endpoint. The details for batch scoring are specified in jobProperties parameter, that includes required and optional parameters.

Required Parameters

The required parameters are:
  • jobName specifies the name of the submitted job.
  • jobType specifies the type of job to be run, which is MODEL_SCORING for batch scoring jobs.
  • inputData is the name of the table or view to read input for the batch scoring job.
  • outputData is the results table in the format {jobId}_{outputData} storing the batch scoring results.
  • modelId is the ID for the model used for scoring.
  • supplementalColumnNames is an array of columns from the input table or view that is used to identify rows in the output table.

Optional Parameters

The optional parameters are:
  • jobStartDate: is the start date and time for the job run. If not specified, the job runs immediately.
  • jobEndDate: is the end date and time for the job run.
  • disableJob: is a flag to disable the job at submission. If not set, the the job is enabled at submission.
  • maxRuns: is the number of times the job will run according to the schedule.
  • inputSchemaName: is the database schema that owns the input table or view. If not specified, the input schema will be the same as the user that obtained the token.
  • outputSchemaName: is the database schema that owns the output table. If not specified the output schema will be the same as the input schema.
  • jobDescription: is a description of the job provided by the user.
  • jobServiceLevel: is the database service level for the job, which can be LOW, MEDIUM, or HIGH.
  • recompute: is a flag whether to replace output table. The default is true.
  • topN: filters the results for classification by returning the highest N probabilities, or for clustering, returning the N most probable cluster assignments.
  • topNDetails: provide prediction details contributing to the probability scores.
Send the following POST request to OML Services to create and submit the batch scoring job:
$ curl -X POST "<oml-cloud-service-location-url>/omlmod/v1/jobs" \
     --header "Authorization: Bearer ${token}" \
     --header 'Content-Type: application/json' \
     --data '{
         "jobSchedule": {  
		     "jobStartDate": "2023-03-23T20:58:46Z",                  # job start date and time
			 "jobEndDate":   "2023-03-28T20:58:46Z",                  # job end date and time
             "repeatInterval": "FREQ=DAILY",                          # job frequency          
             "maxRuns": "4"                                           # max runs within the schedule                 
         },
         "jobProperties": {
             "jobName": "KM_CLUS_MOD1",                               # job name
             "jobType": "MODEL_SCORING",                              # job type; MODEL_SCORING
             "modelId": "4632a963-4340-4ca5-ba30-aff77a4b857a",		  # ID of the model used for scoring	 
             "inputData": "CUSTOMERS360",                             # table or view to read input for the batch scoring job
             "outputData": "KM_CLUS_PRED1",                           # output table to store the scoring results in the format {jobID}_{outputData}
             "supplementalColumnNames": ["CUST_ID"],                  # array of columns from the input data used to identify rows in the output
			 "topN": 2,                                               # filters the results by returning the N most probable cluster assignments
			 "topNDetails": 2,                                        # provides prediction details contributing to the probability scores
			 "recompute": "true"                                      # flag to determine whether to overwrite the result table 
		}
     }' | jq	   
In this example:
  • The model uses the CUSTOMERS360 dataset to predict the cluster membership by customer. If you want to reproduce the example here, you must create the CUSTOMERS360 table. For more information, see Create the CUSTOMERS360 Table
  • The model URI is KM_CLUS_MOD
  • modelId is 4632a963-4340-4ca5-ba30-aff77a4b857a
  • jobSchedule is set to daily with maximum 4 runs.
  • inputData is CUSTOMERS360, the table to read input for the batch scoring job
  • outputData is KM_CLUS_PRED1, an output table to store the job results
  • supplementalColumnNames lists the CUST_ID with the predictions to identify cluster membership, topN and topNDetails to request the top most probable cluster assignments and details contributing to the cluster assignment predictions.
  • recompute is set to true to replace the results table with each run

Response of the Job Request

Once the job is successfully submitted, you will receive a response with the jobId. Note the jobId to submit requests such as retrieving job details or performing an action on the job. Here is an example of a response:
returns:

{
  "jobId": "OML$65977733_A951_42CD_9E3D_5E800604EBDB",
  "links": [
    {
      "rel": "self",
      "href": "<oml-cloud-service-location-url>/jobs/OML%2465977733_A951_42CD_9E3D_5E800604EBDB"
    }
  ]
}

5: View Details of the Batch Scoring Job

To view details of your submitted job, send a GET request to the /omlmod/v1/jobs/{jobID} endpoint. Here, jobId is the ID provided in response to the successful submission of your data monitoring job in the previous step.

Run the following command to view job details:

  1. You first export the job ID to save it to a variable:

    $ export jobid='OML$65977733_A951_42CD_9E3D_5E800604EBDB'    # save job ID to variable
  2. Send a GET request to the /omlmod/v1/jobs/{jobID} endpoint:
    $ curl -X GET "<oml-cloud-service-location-url>/omlmod/v1/jobs/${jobid}"  \
           --header 'Accept: application/json' \
           --header 'Content-Type: application/json' \
           --header "Authorization: Bearer ${token}" | jq

Response of the Job Request

Here is a response of the job details request:
{
  "jobId": "OML$65977733_A951_42CD_9E3D_5E800604EBDB",
  "jobRequest": {
    "jobSchedule": {
      "jobStartDate": "2023-03-23T20:58:46Z",
      "repeatInterval": "FREQ=DAILY",
      "jobEndDate": "2023-03-28T20:58:46Z",
      "maxRuns": 4
    },
    "jobProperties": {
      "jobType": "MODEL_SCORING",
      "inputSchemaName": null,
      "outputSchemaName": null,
      "outputData": "KM_CLUS_PRED1",
      "jobDescription": null,
      "jobName": "KM_CLUS_MOD1",
      "disableJob": false,
      "jobServiceLevel": null,
      "inputData": "CUSTOMERS360",
      "supplementalColumnNames": [
        "CUST_ID"
      ],
      "topN": 2,
      "recompute": true,
      "modelId": "4632a963-4340-4ca5-ba30-aff77a4b857a",
      "topNDetails": 2
    }
  },
  "jobStatus": "CREATED",
  "dateSubmitted": "2023-03-23T20:54:53.248736Z",
  "links": [
    {
      "rel": "self",
      "href": "<oml-cloud-service-location-url>/omlmod/v1/jobs/OML%2465977733_A951_42CD_9E3D_5E800604EBDB"
    }
  ],
  "jobFlags": [],
  "state": "SCHEDULED",
  "enabled": true,
  "runCount": 0,
  "nextRunDate": "2023-03-23T20:58:46Z"
}

6: View the Batch Scoring Job Output

Once your job has run, you can view the results in the table you specified in your job request with the outputData parameter. The full name of the table is {jobid}_{outputData}. You can check if your job is finished by sending a request to view its details.

Run this SQL script to query the output table associated with this example:
%sql

SELECT * FROM OML$65977733_A951_42CD_9E3D_5E800604EBDB_KM_CLUS_PRED1
ORDER BY CUST_ID
FETCH FIRST 5 ROWS ONLY;
This script queries the output table associated with this example. The supplemental column names provided you are returned, along with the cluster ID predictions:
  • OML$CLUSTER_ID1 column contains the cluster with highest probability
  • OML$DETAIL1 column contains the most important column in the prediction details
  • OML$CLUSTER_ID2 column contains the cluster with second highest probability
  • OML$DETAIL2 column contains the second most important column in the prediction details.

Update, Disable, and Delete a Batch Scoring Job

OML Services interacts with the DBMS_SCHEDULER to perform actions on jobs. There are four options for actions:
  • DISABLE: This action disables the job. After an enabled job is disabled, it no longer runs according to its schedule.

    Note:

    Jobs can be set to DISABLED at submission by setting the disableJob flag to true.
  • ENABLE: This action enables a job. After a disabled job is enabled, the scheduler runs the job according to its schedule.
  • RUN: This action immediately runs the job as a way to test it or run it outside of its schedule.
  • STOP: This action stops a currently running job.

When your job is successfully submitted, its state is set to ENABLED by default. This means that it will run as per the schedule specified when submitting the job unless its updated to another state such as DISABLED. You can do this by sending a request to the /omlmod/v1/jobs/{jobid}/action endpoint.

Disable a Batch Scoring Job

Here is an example of an action that updates the job status to DISABLED:

--header "Authorization: Bearer ${token}" \ --header 'Content-Type: application/json' \ --data '{ "action": "DISABLE" }'

Note:

A successfully submitted job receives a 204 response with no body.

Update a Batch Scoring Job

After a batch scoring job is submitted, you have the option to update the job schedule. You can do this by sending a POST request with the updated options to the /omlmod/v1/jobs/{jobID} endpoint.

Update these parameters for a batch scoring job:
  • jobStartDate
  • jobEndDate
  • repeatInterval
  • maxRuns
In the example here, these parameters are updated to make it different from the original parameters set in the initial request:
$ curl -i -X POST "<oml-cloud-service-location-url>/omlmod/v1/jobs/${jobid}" \
     --header "Authorization: Bearer ${token}" \
     --header 'Content-Type: application/json' \
     --data '{
         "jobSchedule": {  
		     "jobStartDate": "2023-03-28T21:00:00Z",
			 "jobEndDate":   "2023-03-30T21:00:00Z",
             "repeatInterval": "FREQ=DAILY",                  
             "maxRuns": "2"                                
         }
	   }' | jq

Note:

A successfully submitted job receives a 204 response with no body.

Delete a Batch Scoring Job

To delete a previously submitted job, send a DELETE request along with the Job ID to the /omlmod/v1/jobs endpoint.

Here is an example of a DELETE request to delete a previously submitted job:
$ curl -X DELETE "<oml-cloud-service-location-url>/omlmod/v1/jobs/${jobid}"  \
       --header 'Accept: application/json' \
       --header 'Content-Type: application/json' \
       --header "Authorization: Bearer ${token}" | jq