Long Running Operations through REST

This document describes the general concepts of performing long running operations following RESTFul principles. More specifically it deals with the category of long running operations that produce and/or work on data file sets.

What is a Long Running Operation?

A long running operation is a well known business operation that typically takes a considerable amount of time to complete. As it is quite hard to come up with a good definition of what "a considerable amount of time" actually means, one should read this as that a user that requests the execution of such an operation does not necessarily want to be blocked (doing other work) waiting for the result of the operation. Therefore the actual business operation is performed asynchronously from the initiating request.

Asynchronous processing and RESTFul principles

Oracle Health Insurance follows well established RESTFul principles to scaffold these kind of asynchronously processing operations. In short it:

  • Exposes a business specific resource per well known, long running business operation.

  • Leverages HTTP response codes and hypermedia links allowing different levels of monitoring and control.

Business specific resources

Oracle Health Insurance uses a business specific resource per well known, long running business operation. This is primarily done to ensure that there is a considerable degree of freedom with respect of the technical mechanisms to use for processing the operation asynchronously. A client of a business operation should be properly insulated from technicalities. Whether the operation is performed using the task processing framework, the activity processing framework or any other work distribution mechanism, should have no functional impact. Instead the client interacts with the system RESTFully, by following a clear and simple protocol.

Having business specific resources for these type of business operations further allows for these operations to be described in concrete terms and REST meta-data. Instead of having to understand internal data and generic payload structures (for instance to feed into activity requests) - the input and output of these operations can be unambiguously defined.

Another benefit of business specific resources is that it allows for easier governance for these types of operations. Think about being able to easily define who is allowed to initiate a particular business operation as well as who has access to the complete audit trail of these operations.

Effectively every long running operation that follows this approach has a fairly straightforward interaction mechanism. In short:

  • A client performs a HTTP POST onto a business specific resource that fronts the desired long running operation.

    • The input for the HTTP POST operation is clearly defined in the resources' meta-data.

  • The system responds with HTTP 201 (Created) and a location header for the long running operation.

From there a client has a number of choices all of which are not mutually exclusive. These are:

  • Go about its way under the anticipation that Oracle Health Insurance emits a notification event (with hypermedia links) once the long running operation has reached an end state.

  • Go about its way and then when it wants to learn more about the long running operation, search for it in the collection of long running operations.

  • Actively interact with the system (through the location header) in order to monitor the long running operation’s progress.

In the following sections we provide a bit more detail to each of these options as well as the hypermedia links that support each option.

Leveraging notification events

Notification events are sent by the system once a long running operation reaches an end state. As such notification events can be used as a means to construct a call-back, allowing a client that initiated the long running operation to continue with its flow of operation. It is crucially important that these notification events carry the right amount of information for such a call-back to make sense. This is the reason why these notification events embed hypermedia links. These are described per long running operation, but typically the following links are provided:

  • operator:subject

    A hypermedia link with rel="operator:subject" where the address refers to the long running operation resource. This can be used by the called back system in order to gain information about the context of the notification event. This link can be followed, for instance to get information about the operator (see next section for a more elaborate description).

  • datafileset

    A hypermedia link with rel="datafileset" where the address refers to the result of the long running operation. Please note that whether or not this link is present, depends on whether the long running operation actually produced a result.

  • Depending on the particular operation it is possible that more links are present.

Notification event may be described per long running operation or may have the following notification structure.

Common Notification Structure

<notification correlationId="" workId="" status="">
   <links>
     <link rel='file' href='http://host:port/contextroot/datafilesets/{datafilesetcode}/datafiles/{datafilecode}/data'/>
     <link rel='file' href='http://host:port/contextroot/datafilesets/{datafilesetcode}/datafiles/{datafilecode}/data'/>
    ...
    </links>
    <fields>
     <key1>value1</key1>
     <key2>value2</key2>
    </fields>
</notification>
  • workId: id of the invoking process, so if it was activities, this would be activityId

  • correlationId: The correlation id if any present during invocation of the process for which this notification is sent out

  • status: Success or Failure in case the notification is for a process, like activity processing. If the activity status is anything other than Completed, status is Failure else, Success.

  • links: In addition to links explained above and operation specific links ref= file is added when operation produces data files. This link can be followed to download data files.

  • fields: Key/value pair of data if any with each notification.

Leveraging ad-hoc collection query capabilities

Even though long running operations cater for call-back through notification events, and interactive monitoring - to be described later - there might be valid reasons that a client does not really need either. A canonical example would be a long running import process for which there is no logical continuation of the flow once that is done. It might still be relevant however to check at some later point about the status/result of these long running operations, for instance for auditing and/or analysis purposes. As long running operation resources are built on top of standard HTTP APIs techniques, it is possible to perform regular collection queries. Depending on the status of the long running operation resources returned the following links are embedded:

  • monitor

    A hypermedia link with rel="monitor" where the address refers to the long running operation resource. For a more complete description check the following section.

  • operator

    A hypermedia link with rel="operator" where the address refers to the (technical) operator that is tasked to execute the long running operation. A client can navigate this link to gain more information about the operator, for example to see about any problems that were encountered processing the request and whenever possible restart a failed operation.

Leveraging interactive monitoring of progress

Typically a client leverages notification events as a means to be kept in the know about long running operations. There might be valid reasons why a client might want to deviate from this pattern and leverage the provided functionality to interactively monitor the progress of a long running operation. A potential use case for example would be a very targeted export of some Oracle Health Insurance data (e.g. ReadConsumptionBatch) constrained by input parameters that result in a very small result data set. Effectively this then means that an inherently asynchronous execution process is made pseudo-blocking. The standard mechanism for this is called polling. To this end following the successful initial submission of a long running operation the ocation header contains the monitoring link. A client can use this information to implement a polling behavior.

GET on this link can contain monitor link:

  • monitor

    A hypermedia link with rel="monitor" where the address refers to the long running operation resource. Whenever this link is embedded in a response entity, it means that the operator that is tasked to execute the long running operation has not reached its end state. A client can use this information to implement a polling behavior.

The absence of the rel="monitor" link effectively means that there is no operator actively working on executing the long running operation. In that case a response entity can hold the following links:

  • operator

    A hypermedia link with rel="operator" where the address refers to the (technical) operator that was tasked to execute the long running operation. See description given earlier.

  • datafileset

    A hypermedia link with rel="datafileset" where the address refers to the result of the long running operation.

Interrogating the operator

In the previous section the hypermedia link with rel="operator" came up a couple of times. Whenever this link relation is embedded in the long running operation (response) entity, it means that a client can interact with it. There are actually two primary reasons why one might do this.

  • It can provide more information as to the messages that were registered by the operator, during execution of the long running operation. Whenever the rel="operator:messages" is present in the operator response entity those messages can be obtained.

  • It potentially allows for a failed operator to be restarted. For long running operations that have run into some kind of issue, it might be possible for these operations to be restarted once the cause of the problem has been resolved. Whenever the rel="operator:restart" is present in the operator response entity a restart can be attempted.

A complete example - ReadConsumptionBatch

In this section we’ll provide a complete example of a long running operation. The example details the execution of ReadConsumptionBatch - a business operation with the goal of producing a data file set with an export of consumption and counter information, matching some input criteria.

The various scenarios of this example have been illustrated by the output of the interaction at protocol wire level. Effectively this way we’re peeking under the hood to see what is going on at the HTTP level. This will for instance provide us insight in the HTTP Response codes that are returned.

Starting the long running operation

A HTTP POST is submitted to /readconsumptionbatches

POST http://localhost:9998/readconsumptionbatches
Accept: application/vnd.oracle.insurance.resource+json,application/vnd.oracle.insurance.resource+json
Authorization: user testuser1
Content-Type: application/vnd.oracle.insurance.resource+json
{
   "transactionEndDateTime" : {
      "value" : "1970-10-11T14:04:42.509+01:00"
   },
   "transactionStartDateTime" : {
      "value" : "1970-10-11T14:04:41+01:00"
   },
   "limitCodes" : [ "VERT" ]
}

In this example a consumption export is requested for consumptions/counters related to limit with code "VERT" over a certain transaction period start- and end-date. The response of the system looks something like:

201
Content-Length: 0
Content-Type: application/vnd.oracle.insurance.resource+json
Vary: Accept,Accept-Encoding,Accept-Language,Origin
X-Content-Type-Options: nosniff
X-Frame-Options: deny
X-OHI-APP-ID: ALI
X-OHI-APP-VERSION: 6.0.0
X-XSS-Protection: 1; mode=block
Location: http://host:port/api/generic/readconsumptionbatches/12324

The response carries the HTTP Response code 201 with the location header to monitor the operation:.

Interactively following the long running operation

In a typical polling approach a client would use the monitor link relation to invoke the related address after some period of time with a HTTP GET. This would then look something like this:

GET http://localhost:9998/generic/readconsumptionbatches/12324
Accept: application/vnd.oracle.insurance.resource+json

Depending on the status of the long running operation, the system would again return a response entity. In case the long running operation would still be "in progress" a response entity similar to the one that was shown above is returned. In that case the HTTP Response code would be 200 (OK). This would also be the response code, in case the long running operation would have reached an end state. However, in that case the response entity would be different.

In case the consumption export was successfully generated a response similar to the one below is returned:

200
Content-Length: 909
Content-Type: application/vnd.oracle.insurance.resource+json
Vary: Accept,Accept-Encoding,Accept-Language,Origin
X-Content-Type-Options: nosniff
X-Frame-Options: deny
X-OHI-APP-ID: ALI
X-OHI-APP-VERSION: 6.0.0
X-XSS-Protection: 1; mode=block
{
   "type" : "readConsumptionBatch",
   "id" : 12324,
   "createdBy" : 20,
   "lastUpdatedBy" : 20,
   "objectVersionNumber" : 1,
   "status" : "Completed",
   "links" : [ {
      "href" : "http://host:port/api/generic/readconsumptionbatches/12324",
      "rel" : "self"
   }, {
      "href" : "http://host:port/api/generic/tasks/19009761",
      "rel" : "operator",
      "httpMethod" : "GET"
   }, {
      "href" : "http://host:port/api/datafilesets/822F386E57B83BA4E053020011ACC5FC",
      "rel" : "datafileset",
      "httpMethod" : "GET"
   } ],
   "creationDate" : {
      "value" : "2019-02-18T18:11:51.004+01:00"
   },
   "lastUpdatedDate" : {
      "value" : "2019-02-18T18:11:51.004+01:00"
   },
   "transactionEndDateTime" : {
      "value" : "1970-10-11T14:04:42.509+01:00"
   },
   "transactionStartDateTime" : {
      "value" : "1970-10-11T14:04:41+01:00"
   },
   "limitCodes" : [ "VERT" ]
}

Most importantly here is the presence of the following rel tag:

  • datafileset - this link relation actually refers to the produced end result of the long running operation.

This would then allow a client to traverse the given link to get to the actual result. This is described in the integration point for data file sets.

In contrast to when the long running operation produces an end result successfully, such a long running operation could potentially also run into issues. In that case the response entity payload looks different. A typical example of what this then looks like is shown below.

200
Content-Length: 587
Content-Type: application/vnd.oracle.insurance.resource+json
Vary: Accept,Accept-Encoding,Accept-Language,Origin
X-Content-Type-Options: nosniff
X-Frame-Options: deny
X-OHI-APP-ID: ALI
X-OHI-APP-VERSION: 6.0.0
X-XSS-Protection: 1; mode=block
{
   "type" : "readConsumptionBatch",
   "id" : 12328,
   "createdBy" : 20,
   "lastUpdatedBy" : 20,
   "objectVersionNumber" : 1,
   "status" : "Failed",
   "links" : [ {
      "href" : "http://host:port/api/generic/readconsumptionbatches/12328",
      "rel" : "self"
   }, {
      "href" : "http://host:port/api/generic/tasks/19009768",
      "rel" : "operator",
      "httpMethod" : "GET"
   } ],
   "creationDate" : {
      "value" : "2019-02-18T18:12:11.127+01:00"
   },
   "lastUpdatedDate" : {
      "value" : "2019-02-18T18:12:11.127+01:00"
   },
   "transactionEndDateTime" : {
      "value" : "1970-10-11T14:04:42.509+01:00"
   },
   "transactionStartDateTime" : {
      "value" : "1970-10-11T14:04:41+01:00"
   "limitCodes" : [ "VERT" ]
}

Note that in this response entity the rel="monitor" is missing, indicating that there is no operator activity working on the long running operation. Given the fact that there is no rel="datafileset" either, this means that there is something else going on. The system exposes the rel="operator" specifically for this purpose, allowing a client to gain more information. As an example below is an example of what following the rel="operator" might look like:

200
Content-Length: 1253
Content-Type: application/vnd.oracle.insurance.resource+json
Vary: Accept,Accept-Encoding,Accept-Language,Origin
X-Content-Type-Options: nosniff
X-Frame-Options: deny
X-OHI-APP-ID: ALI
X-OHI-APP-VERSION: 6.0.0
X-XSS-Protection: 1; mode=block
{
   "type" : "task",
   "id" : 19009768,
   "createdBy" : 20,
   "lastUpdatedBy" : 20,
   "objectVersionNumber" : 1,
   "status" : "ERRORED",
   "subjectId" : 12328,
   "threadCode" : "adjulimits-inttest--1 ( main )",
   "links" : [ {
      "href" : "http://host:port/api/generic/tasks/19009768",
      "rel" : "self"
   }, {
      "href" : "http://host:port/api/generic/interfacedmessages/19006453",
      "rel" : "operator:messages",
      "httpMethod" : "GET"
   }, {
      "href" : "http://host:port/api/taskprocessing/19009768/restart",
      "rel" : "operator:restart",
      "httpMethod" : "POST"
   }, {
      "href" : "http://host:port/api/generic/readconsumptionbatches/12328",
      "rel" : "operator:subject",
      "httpMethod" : "GET"
   } ],
   "creationDate" : {
      "value" : "2019-02-18T18:12:11.127+01:00"
   },
   "lastUpdatedDate" : {
      "value" : "2019-02-18T18:12:11.127+01:00"
   },
   "table" : {
      "id" : 2109405,
      "links" : [ {
         "href" : "hhttp://host:port/api/generic/tables/2109405",
         "rel" : "canonical"
      } ]
   },
   "taskType" : {
      "id" : 8520,
      "links" : [ {
         "href" : "http://host:port/api/generic/tasktypes/8520",
         "rel" : "canonical"
      } ]
   }
}

A couple of link relations are noteworthy here:

  • operator:messages - can be used to get more information about the messages that have been registered by the operator while executing the long running operation.

  • operator:restart - can be used to try and restart the failed operator.

  • operator:subject - effectively this provides the address of the long running operation resource - which is the subject of this operator’s execution.

A client that wants to restart the failed operator can issue a HTTP POST request to the operator:restart address, like so:

POST http://localhost:9998/taskprocessing/19009768/restart
Accept: application/vnd.oracle.insurance.resource+json

Which results in a response similar to this:

200
Content-Length: 1141
Content-Type: application/vnd.oracle.insurance.resource+json
Vary: Accept,Accept-Encoding,Accept-Language,Origin
X-Content-Type-Options: nosniff
X-Frame-Options: deny
X-OHI-APP-ID: ALI
X-OHI-APP-VERSION: 6.0.0
X-XSS-Protection: 1; mode=block
{
   "type" : "task",
   "id" : 19009768,
   "createdBy" : 20,
   "currentAttemptCount" : 0,
   "lastUpdatedBy" : 20,
   "objectVersionNumber" : 3,
   "status" : "PENDING",
   "subjectId" : 12328,
   "threadCode" : "adjulimits-inttest--1 ( main )",
   "links" : [ {
      "href" : "http://host:port/api/generic/tasks/19009768",
      "rel" : "self"
   }, {
      "href" : "http://host:port/api/generic/interfacedmessages/19006453",
      "rel" : "operator:messages",
      "httpMethod" : "GET"
   }, {
      "href" : "http://host:port/api/generic/readconsumptionbatches/12328",
      "rel" : "operator:subject",
      "httpMethod" : "GET"
   } ],
   "creationDate" : {
      "value" : "2019-02-18T18:12:11.127+01:00"
   },
   "lastUpdatedDate" : {
      "value" : "2019-02-18T18:12:11.253+01:00"
   },
   "table" : {
      "id" : 2109405,
      "links" : [ {
         "href" : "http://host:port/api/generic/tables/2109405",
         "rel" : "canonical"
      } ]
   },
   "taskType" : {
      "id" : 8520,
      "links" : [ {
         "href" : "http://host:port/api/generic/tasktypes/8520",
         "rel" : "canonical"
      } ]
   }
}

From that response entity a typical next step would be to get back to the original subject - by following the rel="operator:subject" hypermedia link and the continue monitoring the long running operation.

Leveraging notification events

In case a client opted out of following the long running operation interactively, through polling - once the long running operation reaches an end state a notification event is produced. Such a notification event looks something like this:

{
  "type"  : "ReadConsumptionBatch",
  "topic" : "Notification",
  "timestamp" : {
     "value" : "2019-02-18T18:12:00.325+01:00"
  },
  "links" : [
     { "rel" : "operator:subject",
       "href" : "http://host:port/api/generic/readconsumptionbatches/12323"
     },
     { "rel" : "datafileset",
       "href" : "http://host:port/api/datafilesets/822F386E57B63BA4E053020011ACC5FC"
     },
     { "rel" : "file",
       "href" : "http://host:port/api/datafilesets/822F386E57B63BA4E053020011ACC5FC/datafiles/822F386E57B73BA4E053020011ACC5FC/data"
     }
  ]
}

The structure of a notification event is fairly straightforward. The following fields is present:

  • type

    This denotes the long running operation type for which this event was emitted by the system.

  • topic

    This classifies the event into a specific category - in this case it is a notification event.

  • timestamp

    This is a timestamp at which the event got generated by the system.

  • links

    This contains a collection of links - the once of which are depending on the long running operation.

In the above example the following links are provided:

  • operator:subject

    This link is always present in a notification event for long running operations. It provides the resource location of the long running operation for which this notification event was emitted. As such it provides the ability to reach back in into Oracle Health Insurance to obtain all the information. From that link the conversation can be continued, similarly as was described earlier.

  • datafileset

    This link is present when the long running operation actually produced an (end) result.

  • file

    This link is additionally included particularly for the ReadConsumptionBatch. It allows a calling system to quickly gain access to the stream containing the consumption and counter export.