Example: Batch Scoring using a Clustering Model
This is an example of a batch scoring job submission for a K-Means clustering model. OML Services supports batch scoring for regression, classification, clustering, and feature extraction models.
You require the model ID of the model to be used for scoring. You can get the model ID by sending a GET request to the deployment endpoint and specifying the model URI. The model URI is provided by the user when deploying the model using the AutoML UI or when deploying the model through a REST client.
Batch scoring Workflow
- Deploy the model through AutoML UI
- Authenticate the database user and obtain the access token
- Get the Model ID of the model to be used for scoring
- Create a batch scoring job
- View the details and output of the batch scoring job
- Update, disable, and delete a batch scoring job
Note:
To run REST calls, usecURL
in a Linux or Mac terminal or other REST client such as Postman.
1. Deploy a Model
- For this, go to AutoML UI and Create an AutoML experiment if you opt for the automated way to build machine learning models.
- Deploy a model
2: Authenticate the Database User and Obtain the Access Token
You must obtain an authentication token by using your Oracle Machine Learning (OML) account credentials to send requests to OML Services. To authenticate and obtain a token, use cURL
with the -d
option to pass the credentials for your Oracle Machine Learning account against the Oracle Machine Learning user management cloud service REST endpoint /oauth2/v1/token
. Run the following command to obtain the access token:
$ curl -X POST --header 'Content-Type: application/json' \
--header 'Accept: application/json' \
-d '{"grant_type":"password", "username":"'<yourusername>'","password":"'<yourpassword>'"}'
"<oml-cloud-service-location-url>/omlusers/api/oauth2/v1/token"
-X POST
specifies to use a POST request when communicating with the HTTP server-header
defines the headers required for the request (application/json)-d
sends the username and password authentication credentials as data in a POST request to the HTTP serverContent-Type
defines the response format (JSON)Accept
defines the response format (JSON)yourusername
is the user name of a Oracle Machine Learning user with the default OML_DEVELOPER roleyourpassword
is the password for the user nameoml-cloud-service-location-url
is a URL containing the REST server portion of the Oracle Machine Learning User Management Cloud Service instance URL that includes the tenancy ID and database name. You can obtain the URL from the Development tab in the Service Console of your Oracle Autonomous Database instance.
3: Get the Model ID
modelId
, send a GET request to the deployment endpoint and specify the model URI.
Note:
The model URI is provided by the user when deploying the model using the AutoML UI or when deploying the model through a REST client.GET request to obtain the modelId
:
$ curl -X GET "<oml-cloud-service-location-url>/omlmod/v1/deployment/KM_CLUS_MOD" \
--header "Authorization: Bearer ${token}" | jq '.modelId'
In this example, the model URI is KM_CLUS_MOD
.
The GET request returns the following:
returns: "4632a963-4340-4ca5-ba30-aff77a4b857a"
4: Create and Submit a Batch Scoring Job
After obtaining the access token, you can send a POST request to OML Services to create a batch scoring job.
Batch scoring jobs are initiated by sending a POST request to the /omlmod/v1/jobs
endpoint. The details for batch scoring are specified in jobProperties
parameter, that includes required and optional parameters.
Required Parameters
jobName
specifies the name of the submitted job.jobType
specifies the type of job to be run, which isMODEL_SCORING
for batch scoring jobs.inputData
is the name of the table or view to read input for the batch scoring job.outputData
is the results table in the format{jobId}_{outputData}
storing the batch scoring results.modelId
is the ID for the model used for scoring.supplementalColumnNames
is an array of columns from the input table or view that is used to identify rows in the output table.
Optional Parameters
jobStartDate:
is the start date and time for the job run. If not specified, the job runs immediately.jobEndDate:
is the end date and time for the job run.disableJob:
is a flag to disable the job at submission. If not set, the the job is enabled at submission.maxRuns:
is the number of times the job will run according to the schedule.inputSchemaName:
is the database schema that owns the input table or view. If not specified, the input schema will be the same as the user that obtained the token.outputSchemaName:
is the database schema that owns the output table. If not specified the output schema will be the same as the input schema.jobDescription:
is a description of the job provided by the user.jobServiceLevel:
is the database service level for the job, which can be LOW, MEDIUM, or HIGH.recompute:
is a flag whether to replace output table. The default is true.topN:
filters the results for classification by returning the highest N probabilities, or for clustering, returning the N most probable cluster assignments.topNDetails:
provide prediction details contributing to the probability scores.
$ curl -X POST "<oml-cloud-service-location-url>/omlmod/v1/jobs" \
--header "Authorization: Bearer ${token}" \
--header 'Content-Type: application/json' \
--data '{
"jobSchedule": {
"jobStartDate": "2023-03-23T20:58:46Z", # job start date and time
"jobEndDate": "2023-03-28T20:58:46Z", # job end date and time
"repeatInterval": "FREQ=DAILY", # job frequency
"maxRuns": "4" # max runs within the schedule
},
"jobProperties": {
"jobName": "KM_CLUS_MOD1", # job name
"jobType": "MODEL_SCORING", # job type; MODEL_SCORING
"modelId": "4632a963-4340-4ca5-ba30-aff77a4b857a", # ID of the model used for scoring
"inputData": "CUSTOMERS360", # table or view to read input for the batch scoring job
"outputData": "KM_CLUS_PRED1", # output table to store the scoring results in the format {jobID}_{outputData}
"supplementalColumnNames": ["CUST_ID"], # array of columns from the input data used to identify rows in the output
"topN": 2, # filters the results by returning the N most probable cluster assignments
"topNDetails": 2, # provides prediction details contributing to the probability scores
"recompute": "true" # flag to determine whether to overwrite the result table
}
}' | jq
- The model uses the
CUSTOMERS360
dataset to predict the cluster membership by customer. If you want to reproduce the example here, you must create the CUSTOMERS360 table. For more information, see Create the CUSTOMERS360 Table - The model URI is
KM_CLUS_MOD
modelId
is4632a963-4340-4ca5-ba30-aff77a4b857a
jobSchedule
is set to daily with maximum 4 runs.inputData
isCUSTOMERS360
, the table to read input for the batch scoring joboutputData
isKM_CLUS_PRED1
, an output table to store the job resultssupplementalColumnNames
lists theCUST_ID
with the predictions to identify cluster membership,topN
andtopNDetails
to request the top most probable cluster assignments and details contributing to the cluster assignment predictions.recompute
is set totrue
to replace the results table with each run
Response of the Job Request
jobId
. Note the jobId
to submit requests such as retrieving job details or performing an action on the job. Here is an example of a response: returns:
{
"jobId": "OML$65977733_A951_42CD_9E3D_5E800604EBDB",
"links": [
{
"rel": "self",
"href": "<oml-cloud-service-location-url>/jobs/OML%2465977733_A951_42CD_9E3D_5E800604EBDB"
}
]
}
5: View Details of the Batch Scoring Job
To view details of your submitted job, send a GET request to the /omlmod/v1/jobs/{jobID}
endpoint. Here, jobId
is the ID provided in response to the successful submission of your data monitoring job in the previous step.
Run the following command to view job details:
-
You first export the job ID to save it to a variable:
$ export jobid='OML$65977733_A951_42CD_9E3D_5E800604EBDB' # save job ID to variable
- Send a GET request to the
/omlmod/v1/jobs/{jobID}
endpoint:$ curl -X GET "<oml-cloud-service-location-url>/omlmod/v1/jobs/${jobid}" \ --header 'Accept: application/json' \ --header 'Content-Type: application/json' \ --header "Authorization: Bearer ${token}" | jq
Response of the Job Request
{
"jobId": "OML$65977733_A951_42CD_9E3D_5E800604EBDB",
"jobRequest": {
"jobSchedule": {
"jobStartDate": "2023-03-23T20:58:46Z",
"repeatInterval": "FREQ=DAILY",
"jobEndDate": "2023-03-28T20:58:46Z",
"maxRuns": 4
},
"jobProperties": {
"jobType": "MODEL_SCORING",
"inputSchemaName": null,
"outputSchemaName": null,
"outputData": "KM_CLUS_PRED1",
"jobDescription": null,
"jobName": "KM_CLUS_MOD1",
"disableJob": false,
"jobServiceLevel": null,
"inputData": "CUSTOMERS360",
"supplementalColumnNames": [
"CUST_ID"
],
"topN": 2,
"recompute": true,
"modelId": "4632a963-4340-4ca5-ba30-aff77a4b857a",
"topNDetails": 2
}
},
"jobStatus": "CREATED",
"dateSubmitted": "2023-03-23T20:54:53.248736Z",
"links": [
{
"rel": "self",
"href": "<oml-cloud-service-location-url>/omlmod/v1/jobs/OML%2465977733_A951_42CD_9E3D_5E800604EBDB"
}
],
"jobFlags": [],
"state": "SCHEDULED",
"enabled": true,
"runCount": 0,
"nextRunDate": "2023-03-23T20:58:46Z"
}
6: View the Batch Scoring Job Output
Once your job has run, you can view the results in the table you specified in your job request with the outputData
parameter. The full name of the table is {jobid}_{outputData}
. You can check if your job is finished by sending a request to view its details.
%sql
SELECT * FROM OML$65977733_A951_42CD_9E3D_5E800604EBDB_KM_CLUS_PRED1
ORDER BY CUST_ID
FETCH FIRST 5 ROWS ONLY;
OML$CLUSTER_ID1
column contains the cluster with highest probabilityOML$DETAIL1
column contains the most important column in the prediction detailsOML$CLUSTER_ID2
column contains the cluster with second highest probabilityOML$DETAIL2
column contains the second most important column in the prediction details.
Update, Disable, and Delete a Batch Scoring Job
DBMS_SCHEDULER
to perform actions on jobs. There are four options for actions:
DISABLE
: This action disables the job. After an enabled job is disabled, it no longer runs according to its schedule.Note:
Jobs can be set toDISABLED
at submission by setting thedisableJob
flag totrue
.ENABLE
: This action enables a job. After a disabled job is enabled, the scheduler runs the job according to its schedule.RUN
: This action immediately runs the job as a way to test it or run it outside of its schedule.STOP
: This action stops a currently running job.
When your job is successfully submitted, its state is set to ENABLED
by default. This means that it will run as per the schedule specified when submitting the job unless its updated to another state such as DISABLED
. You can do this by sending a request to the /omlmod/v1/jobs/{jobid}/action
endpoint.
Disable a Batch Scoring Job
Here is an example of an action that updates the job status to DISABLED
:
--header "Authorization: Bearer ${token}" \ --header 'Content-Type: application/json' \ --data '{ "action": "DISABLE" }'
Note:
A successfully submitted job receives a 204 response with no body.Update a Batch Scoring Job
After a batch scoring job is submitted, you have the option to update the job schedule. You can do this by sending a POST request with the updated options to the /omlmod/v1/jobs/{jobID}
endpoint.
jobStartDate
jobEndDate
repeatInterval
maxRuns
$ curl -i -X POST "<oml-cloud-service-location-url>/omlmod/v1/jobs/${jobid}" \
--header "Authorization: Bearer ${token}" \
--header 'Content-Type: application/json' \
--data '{
"jobSchedule": {
"jobStartDate": "2023-03-28T21:00:00Z",
"jobEndDate": "2023-03-30T21:00:00Z",
"repeatInterval": "FREQ=DAILY",
"maxRuns": "2"
}
}' | jq
Note:
A successfully submitted job receives a 204 response with no body.Delete a Batch Scoring Job
To delete a previously submitted job, send a DELETE
request along with the Job ID to the /omlmod/v1/jobs
endpoint.
$ curl -X DELETE "<oml-cloud-service-location-url>/omlmod/v1/jobs/${jobid}" \
--header 'Accept: application/json' \
--header 'Content-Type: application/json' \
--header "Authorization: Bearer ${token}" | jq