31 Managing Thread Execution

This chapter provides instructions for controlling the execution behavior of Coherence service threads using task timeouts and the PriorityTask API for custom execution processing.

This chapter includes the following sections:

31.1 Overview of Priority Tasks

Coherence priority tasks provide applications that have critical response time requirements better control of the execution of processes within Coherence. Execution and request timeouts can be configured to limit wait time for long running threads. In addition, a custom task API allows applications to control queue processing. Use these features with extreme caution because they can dramatically affect performance and throughput of the data grid.

31.2 Setting Priority Task Timeouts

Care should be taken when configuring Coherence task execution timeouts especially for Coherence applications that pre-date this feature and thus do not handle timeout exceptions. For example, if a write-through in a CacheStore is blocked and exceeds the configured timeout value, the Coherence task manager attempts to interrupt the execution of the thread and an exception is thrown. In a similar fashion, queries or aggregations that exceed configured timeouts are interrupted and an exception is thrown. Applications that use this feature should ensure that they handle these exceptions correctly to ensure system integrity. Since this configuration is performed on a service by service basis, changing these settings on existing caches/services not designed with this feature in mind should be done with great care.

31.2.1 Configuring Execution Timeouts

The <request-timeout>, <task-timeout>, and the <task-hung-threshold> elements are used to configure execution timeouts for a service's worker threads. These timeout settings are configured for a service in a cache configuration file and can also be set using command line parameters. See Using the Service Guardian, for information on setting timeouts for service threads.

Table 31-1 describes the execution timeout elements.

Table 31-1 Execution Timeout Elements

Element Name Description

Element Name	Description
<`request-timeout`>	Specifies the default timeout value for requests that can time out (for example, implement the `PriorityTask` interface), but do not explicitly specify the request timeout value. The request time is measured on the client side as the time elapsed from the moment a request is sent for execution to the corresponding server node(s) and includes the following: The time it takes to deliver the request to an executing node (server). The interval between the time the task is received and placed into a service queue until the execution starts. The task execution time. The time it takes to deliver a result back to the client. If the value does not contain a unit, a unit of milliseconds is assumed. Legal values are positive integers or zero (indicating no default timeout). Default value is an infinite timeout (`0s`) for clustered client requests and 30 seconds (`30s`) for non-clustered client requests.
<`task-timeout`>	Specifies the default timeout value for tasks that can be timed-out (for example, entry processors that implement the `PriorityTask` interface), but do not explicitly specify the task execution timeout value. The task execution time is measured on the server side and does not include the time spent waiting in a service backlog queue before being started. This attribute is applied only if the thread pool is used (the `<thread-count-min>` and `<thread-count-max>` values are positive). If `zero` is specified, the default service-guardian `<timeout-milliseconds>` value is used.
<`task-hung-threshold`>	Specifies the amount of time in milliseconds that a task can execute before it is considered "hung". Note: a posted task that has not yet started is never considered as hung. This attribute is applied only if the thread pool is used (the `<thread-count-min>` and `<thread-count-max>` values are positive).

<request-timeout>

Specifies the default timeout value for requests that can time out (for example, implement the PriorityTask interface), but do not explicitly specify the request timeout value. The request time is measured on the client side as the time elapsed from the moment a request is sent for execution to the corresponding server node(s) and includes the following:

The time it takes to deliver the request to an executing node (server).
The interval between the time the task is received and placed into a service queue until the execution starts.
The task execution time.
The time it takes to deliver a result back to the client.

If the value does not contain a unit, a unit of milliseconds is assumed. Legal values are positive integers or zero (indicating no default timeout). Default value is an infinite timeout (0s) for clustered client requests and 30 seconds (30s) for non-clustered client requests.

<task-timeout>

Specifies the default timeout value for tasks that can be timed-out (for example, entry processors that implement the PriorityTask interface), but do not explicitly specify the task execution timeout value. The task execution time is measured on the server side and does not include the time spent waiting in a service backlog queue before being started. This attribute is applied only if the thread pool is used (the <thread-count-min> and <thread-count-max> values are positive). If zero is specified, the default service-guardian <timeout-milliseconds> value is used.

<task-hung-threshold>

Specifies the amount of time in milliseconds that a task can execute before it is considered "hung". Note: a posted task that has not yet started is never considered as hung. This attribute is applied only if the thread pool is used (the <thread-count-min> and <thread-count-max> values are positive).

The following distributed cache example explicitly configures the service dynamic thread pool, a task time out of 5000 milliseconds, and a task hung threshold of 10000 milliseconds:

<caching-schemes>
    <distributed-scheme>
      <scheme-name>example-distributed</scheme-name>
      <service-name>DistributedCache</service-name>
      <thread-count-min>7</thread-count-min>
      <thread-count-max>20</thread-count-max>
      <task-hung-threshold>10000</task-hung-threshold>
      <task-timeout>5000</task-timeout>
    </distributed-scheme>
</caching-schemes>

Setting the client request timeout to 15 milliseconds

<distributed-scheme>
      <scheme-name>example-distributed</scheme-name>
      <service-name>DistributedCache</service-name>
      <request-timeout>15000ms</request-timeout>
    </distributed-scheme>

Note:

The request-timeout should always be longer than the thread-hung-threshold or the task-timeout.

31.2.2 Execution Timeout Command Line Options

Use the command line options to set the service type default (such as distributed cache, invocation, proxy, and so on) for the node. Table 31-2 describes the options.

Table 31-2 Command Line Options for Setting Execution Timeout

Option	Description
`coherence.replicated.request.timeout`	The default client request timeout for the Replicated cache service
`coherence.optimistic.request.timeout`	The default client request timeout for the Optimistic cache service
`coherence.distributed.request.timeout`	The default client request timeout for distributed cache services
`coherence.distributed.task.timeout`	The default server execution timeout for distributed cache services
`coherence.distributed.task.hung`	The default time before a thread is reported as hung by distributed cache services
`coherence.invocation.request.timeout`	The default client request timeout for invocation services
`coherence.invocation.task.hung`	The default time before a thread is reported as hung by invocation services
`coherence.invocation.task.timeout`	The default server execution timeout invocation services
`coherence.proxy.request.timeout`	The default client request timeout for proxy services
`coherence.proxy.task.timeout`	The default server execution timeout proxy services
`coherence.proxy.task.hung`	The default time before a thread is reported as hung by proxy services

31.3 Creating Priority Task Execution Objects

The PriorityTask interface enables you to control the ordering in which a service schedules tasks for execution using a thread pool and hold the task execution time to a specified limit. Instances of PriorityTask typically also implement either the Invocable or Runnable interface. Priority Task Execution is only relevant when a task back log exists.

The API defines the following ways to schedule tasks for execution

SCHEDULE_STANDARD—a task is scheduled for execution in a natural (based on the request arrival time) order
SCHEDULE_FIRST—a task is scheduled in front of any equal or lower scheduling priority tasks and executed as soon as any of worker threads become available
SCHEDULE_IMMEDIATE—a task is immediately executed by any idle worker thread; if all of them are active, a new thread is created to execute this task

31.3.1 APIs for Creating Priority Task Objects

Coherence provides the following classes to help create priority task objects:

PriorityProcessor can be extended to create a custom entry processor.
PriorityFilter can be extended to create a custom priority filter.
PriorityAggregator can be extended to create a custom aggregation.
PriorityTask can be extended to create an priority invocation class.

After extending each of these classes the developer must implement several methods. The return values for getRequestTimeoutMillis, getExecutionTimeoutMillis, and getSchedulingPriority should be stored on a class-by-class basis in your application configuration parameters. These methods are described in Table 31-3.

Table 31-3 Methods to Support Task Timeout

Method	Description
`public long getRequestTimeoutMillis()`	Obtains the maximum amount of time a calling thread is can wait for a result of the request execution. The request time is measured on the client side as the time elapsed from the moment a request is sent for execution to the corresponding server node(s) and includes: the time it takes to deliver the request to the executing node(s); the interval between the time the task is received and placed into a service queue until the execution starts; the task execution time; the time it takes to deliver a result back to the client. The value of `TIMEOUT_DEFAULT` indicates a default timeout value configured for the corresponding service; the value of `TIMEOUT_NONE` indicates that the client thread is can wait indefinitely until the task execution completes or is canceled by the service due to a task execution timeout specified by the `getExecutionTimeoutMillis()` value.
`public long getExecutionTimeoutMillis()`	Obtains the maximum amount of time this task is allowed to run before the corresponding service attempts to stop it. The value of `TIMEOUT_DEFAULT` indicates a default timeout value configured for the corresponding service; the value of `TIMEOUT_NONE` indicates that this task can execute indefinitely. If, by the time the specified amount of time passed, the task has not finished, the service attempts to stop the execution by using the `Thread.interrupt()` method. In the case that interrupting the thread does not result in the task's termination, the `runCanceled` method is called.
`public int getSchedulingPriority()`	Obtains this task's scheduling priority. Valid values are `SCHEDULE_STANDARD`, `SCHEDULE_FIRST`, `SCHEDULE_IMMEDIATE`
`public void runCanceled(boolean fAbandoned)`	This method is called if and only if all attempts to interrupt this task were unsuccessful in stopping the execution or if the execution was canceled before it had a chance to run at all. Since this method is usually called on a service thread, implementors must exercise extreme caution since any delay introduced by the implementation causes a delay of the corresponding service.

31.3.2 Errors Thrown by Task Timeouts

When a task timeout occurs the node gets a RequestTimeoutException. For example:

com.tangosol.net.RequestTimeoutException: Request timed out after 4015 millis
        at com.tangosol.coherence.component.util.daemon.queueProcessor.Service.checkRequestTimeout(Service.CDB:8)
        at com.tangosol.coherence.component.util.daemon.queueProcessor.Service.poll(Service.CDB:52)
        at com.tangosol.coherence.component.util.daemon.queueProcessor.Service.poll(Service.CDB:18)
        at com.tangosol.coherence.component.util.daemon.queueProcessor.service.InvocationService.query(InvocationService.CDB:17)
        at com.tangosol.coherence.component.util.safeService.SafeInvocationService.query(SafeInvocationService.CDB:1)