16 Oracle Human Workflow Performance Tuning

This chapter describes how to tune Oracle Human Workflow for optimal performance. You can tune Oracle Human Workflow in these areas:

Section 16.1, "About Oracle Human Workflow"
Section 16.2, "Basic Tuning Considerations"
Section 16.3, "Improving Server Performance"
Section 16.4, "Completing Workflows Faster"
Section 16.5, "Tuning Identity Provider"
Section 16.6, "Tuning the Database"

16.1 About Oracle Human Workflow

Oracle Human Workflow is a service engine running in Oracle SOA Service Infrastructure that allows the execution of interactive human driven processes. A human workflow provides the human interaction support such as approve, reject, and reassign actions within a process or outside of any process. The Human Workflow service consists of a number of services that handle various aspects of human interaction with a business process.

For more information, see "Using the Human Workflow Service Component" in Oracle Fusion Middleware Developer's Guide for Oracle SOA Suite.

See also the Oracle Human Workflow web site at http://www.oracle.com/technology/products/soa/hw/index.html.

16.2 Basic Tuning Considerations

This section discusses the various options available to address performance issues:

Minimize Client Response Time
Choose the Right Workflow Service Client
Narrow Qualifying Tasks Using Precise Filters
Retrieve Subset of Qualifying Tasks (Paging)
Fetch Only the Information That Is Needed for a Qualifying Task
Reduce the Number of Return Query Columns
Use the Aggregate API for Charting Task Statistics
Use the Count API Methods for Counting the Number of Tasks
Create Indexes On Demand for Flexfields
Use the doesTaskExist Method

16.2.1 Minimize Client Response Time

Since workflow client applications are interactive, it is important to have good response time at the client. Some of the factors that affect the response time include service call performance impacts, querying time to determine the set of qualifying tasks for the request, and the amount of additional information to be retrieved for each qualifying task.

16.2.2 Choose the Right Workflow Service Client

Workflow services support two major types of clients: SOAP and EJB clients. EJB clients can be further separated into local EJB clients and remote EJB clients.

If the client application is based on .Net technologies, then only the SOAP workflow services can be used. However, if the client application is based on Java EE technology, then consider which client should be used based on your use case scenarios. The options are listed below:

Remote client - This is the best option in terms of performance in most cases. If the client is running in the same JVM as the workflow services (soa-infra application), the API calls are optimized so that there is no remote method invocation (RMI) involved. If the client is on a different JVM, then RMI is used, which can impact performance due to the serialization and de-serialization of data between the API methods.
SOAP client - While this option is preferred for standardization (based on web services), there are additional performance considerations when compared to the remote method invocation (RMI) used in the remote client. Additional processing is performed by the web-services technology stack which causes the marshalling and unmarshalling of API method arguments between XML.

For more information, see Oracle Fusion Middleware Developer's Guide for Oracle SOA Suite.

16.2.3 Narrow Qualifying Tasks Using Precise Filters

Using precise filters is one of the most important factors in improving response time. When a task list is retrieved, the query should be as precise as possible so the maximum filtering can be done at the database level.

For example, when the inbox view is requested for a user, the tasks are filtered mainly based on whether they are assigned to the current user or to the groups the user belongs to. By specifying additional predicate filters on the inbox view, the overall response time for the query can be reduced since lesser number of tasks qualify.

Alternatively, you can define views by specifying predicate filters and the overall response time for such views is reduced since lesser number of tasks qualify. All predicates passed to the query APIs (or defined in the views) are directly pushed to the database level SQL queries. With this information, the database optimizer can use the best indexes to create an optimal execution plan. The additional filters can be based on task attributes or promoted flex fields. For example, instead of listing all PO approval tasks, views can be defined to present tasks to the user based on priority, date, category, or amount range.

Example: To retrieve all assigned tasks for a user with priority = 1, you can use the following API call:

Predicate pred = new Predicate(TableConstants.WFTASK_STATE_COLUMN,
Predicate.OP_EQ,
IWorkflowConstants.TASK_STATE_ASSIGNED);
pred.addClause(Predicate.AND,
TableConstants.WFTASK_PRIORITY_COLUMN,
Predicate.OP_EQ,

1);
List tasks = querySvc.queryTasks(ctx,
queryColumns,
null,
ITaskQueryService.AssignmentFilter.MY ITaskQueryService.AssignmentFilter.MY,
null,
pred,
null,
startRow,
endRow);

16.2.4 Retrieve Subset of Qualifying Tasks (Paging)

Once the task list has been narrowed down to meet a specific criteria as discussed in the previous section, the next level of filtering is based on how many tasks are to be presented to the user. You want to avoid fetching too many rows, which not only increases the query time but also increases the application process time and the amount of data returned to client. The query API has paging parameters that control the number of qualifying rows returned to the user and the start row.

For example, in the queryTasks method:

List tasks = querySvc.queryTasks(ctx,
queryColumns,
null,
 ITaskQueryService.AssignmentFilter.MY,
null,
pred,
null,
startRow,
endRow);

Consider setting the startRow and endRow parameters to values that may limit the number of return matching records.

16.2.5 Fetch Only the Information That Is Needed for a Qualifying Task

When using the queryTask service, consider reducing the amount of optional information retrieved for each task returned in the list. This may reduce the performance impacts from additional SQL query and Java logic.

For example, in the following queryTasks method, only the group actions information is retrieved. You can also retrieve attachment and payload information directly in the listing, but you may encounter performance impacts.

List<ITaskQueryService.OptionalInfo> optionalInfo
= new ArrayList<ITaskQueryService.OptionalInfo>();
optionalInfo.add(ITaskQueryService.OptionalInfo.GROUP_ACTIONS);
// optionalInfo.add(ITaskQueryService.OptionalInfo.ATTACHMENTS);
// optionalInfo.add(ITaskQueryService.OptionalInfo.PAYLOAD);
List tasks = querySvc.queryTasks(ctx,
queryColumns,
optionalInfo,
ITaskQueryService.AssignmentFilter.MY,
null,
pred,
null,
startRow,
endRow);

In rare cases where the entire payload is needed, then the payload information can be requested. Typically only some of the payload fields are needed for displaying the task list. For example, for PO Tasks, the PO amount may be a column that must be displayed. Rather than fetching the payload as additional information and then retrieving the amount using an xpath expression and displaying it in the listing, consider mapping the amount column from the payload to a flex field. The flex field can then be directly retrieved during SQL querying which may significantly reduce the processing time.

Similarly, for attachments where the name of the attachment is to be displayed in the listing and the document itself is stored in an external repository, consider capturing the attachment name in the payload and mapping it to a flex field, so that processing time is optimized. While constructing the listing information, the link to the attachment can be constructed by fetching the appropriate flex field.

16.2.6 Reduce the Number of Return Query Columns

When using the queryTask service, consider reducing the number of query columns to improve the SQL time. Also, try to use the common columns as they are most likely indexed and the SQL can execute faster.

For example, in the following queryTasks method, only the TASKNUMBER and TITLE columns are returned:

List queryColumns = new ArrayList();
queryColumns.add("TASKNUMBER");
queryColumns.add("TITLE");
...
List tasks = querySvc.queryTasks(ctx,
null,
 ITaskQueryService.AssignmentFilter.MY,
null,
pred,
null,
startRow,
endRow);

16.2.7 Use the Aggregate API for Charting Task Statistics

Sometimes it is necessary to display charts or statistics to summarize task information. Rather than fetching all the tasks using the query API, and computing the statistics at the client layer, consider using the new aggregate APIs to compute the statistics at the database level.

For example, the following call illustrates the use of the API to get summarized statistics based on state for tasks assigned to a user:

List taskCounts = querySvc.queryAggregatedTasks(ctx,
Column.getColumn(WFTaskConstants.STATE_COLUMN),
 ITaskQueryService.AssignmentFilter.MY,
keyWordFilter,
filterPredicate,
false,
false);

16.2.8 Use the Count API Methods for Counting the Number of Tasks

Sometimes it is only necessary to count how many tasks exist that match certain criteria. Rather than calling the queryTasks API method, and determining the size of the returned list, call the countTasks API method, which returns only the number of matching tasks. The performance impact of returning a count of tasks is much lower than returning a list of task objects.

For example, the following call illustrates the use of the API to get the total number of tasks assigned to a user:

int numberOfTasks = querySvc.countTasks(ctx,
ITaskQueryService.AssignmentFilter.MY,
keyWordFilter,
filterPredicate);

16.2.9 Create Indexes On Demand for Flexfields

The workflow schema table WFTASK contains several flexfield attribute columns that can be used for storing task payload values in the workflow schema. Because there are numerous columns, and their use is optional, the installed schema does not contain indexes for these columns. In certain use-cases, for example, where certain mapped flexfield columns are frequently used in query predicates, performance can be improved if you create indexes on these columns.

For example, to create an index on the TEXTATTRIBUTE1 column, the following SQL command should be run:

create index WFTASKTEXTATTRIBUTE1_I on WFTASK(TEXTATTRIBUTE1);

Note:

The exact indexes required depend on the flexfield attribute columns being used, and the nature of the queries being executed. After creating the indexes, the statistics for the WFTASK table should be re-computed and flushed.

16.2.10 Use the doesTaskExist Method

Sometimes it is necessary to check whether any tasks exist that match particular query criteria. Rather than calling the countTasks method, and checking if the number returned is zero, consider using doesTaskExist. The doesTaskExist method performs an optimized query that simply checks if any rows exist that match the specified criteria. This method may achieve better results than calling the countTasks method.

For example, the following call illustrates the use of the API method to determine if a user owns any task instances:

boolean userOwnsTask = querySvc.doesTaskExist(ctx, ITaskQueryService.AssignmentFilter.OWNER,null,null);

16.3 Improving Server Performance

Server performance essentially determines the scalability of the system under heavily loaded conditions. Section 16.2.1, "Minimize Client Response Time" lists several ways in which client response times can be minimized by fetching the right of amount of information and reducing the potential performance impact associated with querying. These techniques also reduce the database and service logic performance impacts at the server and can improve server performance. In addition, a few other configuration changes can be made to improve server performance:

Archive Completed Instances Periodically
Select the Appropriate Workflow Callback Functionality
Minimize Performance Impacts from Notification
Deploy Clustered Nodes

16.3.1 Archive Completed Instances Periodically

The database scalability of a system is largely dependent on the amount of data in the system. Since business processes and workflows are temporal in nature, once they are processed, they are not queried frequently. Having numerous completed instances in the system can slow the system. Consider using an archival scheme to periodically move completed instances to another system that can be used to query historical data. Archival should be done carefully to avoid orphan task instances.

16.3.2 Select the Appropriate Workflow Callback Functionality

The workflow callback functionality can be used to query or update external systems after any significant workflow event, such as assignment or completion of task. While this functionality is very useful, it has to be implemented correctly to avoid impacting performance.

When performance is critical, ensure that there are sufficient resources to update the external system after the task is completed instead of after every workflow event. For example, instead of using a callback, the service can be invoked once after the completion of the task. If a callback cannot be avoided, then consider using a Java callback instead of a BPEL callback. Java callbacks do not have the performance impact associated with a BPEL callback since the callback method is executed in the same thread. In contrast, a BPEL callback may impact performance when sending a message to the BPEL engine, which in turn must be correlated so that it is delivered to the correct process instance. The workflow service has to be called by the BPEL engine after the invocation of the service.

16.3.3 Minimize Performance Impacts from Notification

Notifications are useful for alerting users that they have a task to execute. In environments where most approvals happen through email, actionable notifications are especially useful. This also implies that there is not much load in terms of worklist usage. However if most users interact through the Worklist, and notifications serve a secondary purpose, then notifications should be used judiciously. Consider minimizing the notification to just alert a user when a task is assigned instead of sending out notifications for each workflow event. Also, if the task content is also mailed in the notification there may be an impact to performance. To minimize the impact, consider making the notifications secure in which case only a link to the task is sent in the notification and not the task content itself.

16.3.4 Deploy Clustered Nodes

All workflow instances and state information are stored in the dehydration database. Workflow services are stateless which means they can be used concurrently on a cluster of nodes. When performance is critical and a highly scalable system is needed, a clustered environment can be used for supporting workflow. For more information on clustered architecture, see Section 29.2, "Using Clusters with Oracle Fusion Middleware".

16.4 Completing Workflows Faster

The time it takes for a workflow to complete depends on the routing type specified for the workflow. The workflow functionality provides some options that can be used to improve the amount of time it takes to complete workflows. Some of these options are discussed in this section:

Use Workflow Reports to Monitor Progress
Specify Escalation Rules
Specify User and Group Rules for Automated Assignment
Use Task Views to Prioritize Work

16.4.1 Use Workflow Reports to Monitor Progress

Several workflow reports (and corresponding views) are available that can make monitoring and proactively fixing problems easier. A few of these reports are listed below:

The Unattended Tasks Report provides a list of group tasks that need attention since they have not yet been acquired by any user to work on.
The Task Cycle Time Report gives an idea of how much time it takes for a particular type of workflow to complete.
The Task Productivity Report indicates the inflow and outflow of tasks for different users.
The Assignee Time Distribution Report provides a detailed drill-down of the time spent by each user during the task life cycle (including the idle time when the task was waiting to be picked up by a user.)

All of these reports can be used effectively to fix problems. By checking unattended tasks report, you can assign tasks that have been in the queue for a long time to specific users. By monitoring cycle time and other statistics, you can add staff to groups that are overloaded or take a longer time to complete. Thus reports can be used effectively to ensure workflows complete faster.

16.4.2 Specify Escalation Rules

To ensure that tasks do not get stuck at any user, you can specify escalation rules. For example, you can move a task to a manager if a certain amount of time passes without any action being taken on the task. Custom escalation rules can also be plugged in if the task must be escalated to some other user based on alternative routing logic. By specifying proper escalation rules, you can reduce workflow completion times.

16.4.3 Specify User and Group Rules for Automated Assignment

Instead of manually reassigning tasks to other users or members of a group, you can use user and group rules to perform automated reassignment. This ensures that workflows get timely attention. For example, a user can set up a user rule such that workflows of a specific type and matching a certain filter criteria are automatically reassigned to another user in a specified time window. Similarly, a group rule can be used to automatically reassign workflows to a member of the group based on different routing criteria such as round robin or most productive. Thus rules can help significantly reduce workflow waiting time, which results in faster workflow completion.

16.4.4 Use Task Views to Prioritize Work

A user's inbox can contain tasks of various types with various due dates. The user has to manually sift through the tasks or sort them to find out which one he or she should work on next. Instead, by creating task views where tasks are filtered based on due dates or priority, users can get their work prioritized automatically so they can focus on completing their tasks instead of wasting their time on deciding which tasks to work on. This also results in faster completion of workflows.

16.5 Tuning Identity Provider

The workflow service uses information from the identity provider in constructing the SQL query to determine the tasks qualifying for a user based on his or her role/group membership. The identity provider is also queried for determining role information to determine privileges of a user when fetching the details of a task and determining what actions can the user perform on a task. There are a few ways to speed up requests made to the identity provider.

Set the search base in the identity configuration file to node(s) as specific as possible. Ideally you should populate workflow-related groups under a single node to minimize traversal for search and lookup. This is not always possible; for example, you may need to use existing groups and grant membership to groups located in other nodes. If it is possible to specify filters that can narrow down the nodes to be searched, then you should specify them in the identity configuration file.
Index all critical attributes such as dn and cn in the identity provider. This ensures that when a search or a lookup is done, only a subset of the nodes are traversed instead of a full tree traversal.
Use an identity provider that supports caching. Not all LDAP providers support caching but Oracle Internet Directory supports caching which can make lookup and search queries faster.

16.6 Tuning the Database

The Human Workflow schema is shipped with several indexes defined on the most important columns for all the tables. Based on the type of request, different SQL queries are generated to fetch the task list for a user. The database optimizer evaluates the cost of different plan alternatives (for example, full table scan, access table by index) and decides on a plan that is lower in cost. For the optimizer to work correctly, the index statistics should be current at all times. As with any database usage, it is important to make sure the database statistics are updated at regular intervals and other tunable parameters such as memory, table space, and partitions are used effectively to get maximum performance.

For more information on tuning the database, see Section 2.6, "Tuning Database Parameters".