Siebel Business Process Framework: Workflow Guide > Administering Workflow Processes > Diagnosing a Failed Workflow Process in a Production Environment >
About the Workflow Recovery Manager
This topic describes Workflow Recovery Manager. It includes the following topics:
Workflow Recovery Manager Overview
Functionality provided by Workflow Recovery Manager includes:
- Recovers interrupted long running workflow instances upon Siebel Server failure. As the workflow instance is recovered, the workflow engine attempts to continue forward execution. If forward progress is not possible, the workflow is marked as IN_ERROR and manual intervention is required to resume the workflow.
- A workflow instance is resumed at the last checkpoint to maintain the integrity of the execution. The checkpoint is automatically determined by the workflow engine based on hints provided by the developer during design time. The workflow engine automatically persists relevant execution data to resume execution upon workflow process failure.
- Recovery of workflow process instances is load-balanced. One server thread can be responsible for determining the recovered instances. This server thread delegates the actual resumption of the recovered instance to other server threads in a load-balanced manner.
- Workflow recovery can be started without interfering normal execution of the workflow engine. One server thread can start the recovery of workflow instances, while other threads in the workflow engine still handle new requests to execute workflows. Thus, the workflow engine is blocked by the recovery thread.
Workflow Recovery Manager Architecture
Figure 24 illustrates the architecture used for recovering a workflow process.
Figure 24. Workflow Recovery Manager Architecture
The Workflow Recovery Manager architecture consists of the following components:
- Workflow Engine. A batch component, WfProcMgr, responsible for executing long running workflows. As the workflow is executed, the workflow engine persists the execution state into the database at the appropriate time.
- Workflow Recovery Manager. A batch component responsible for identifying interrupted workflows due to server failure. When Workflow Recovery Manager discovers an interrupted workflow instance, the recovery manager forwards the workflow instance to a workflow engine to resume its execution. The recovery manager itself does not execute the workflow instance.
- Database. A database used for storing the execution state of the workflows. The persisted record is used to restore execution state when the workflow process resumes execution from failure.
- Server Message Queue. The workflow engines and the recovery managers multi-cast messages through the Server Message Queue.
- Business Process Administration View. Allows the administrator to manually request the Workflow Recovery Manager to recover an interrupted workflow.
- Server Administration Task Management. Allows the system administrator to create a recurrence task for the Workflow Recovery Manager. The recurrence task requests the server manager to periodically scan for interrupted workflows.
Administering Workflow Recovery Manager
The status of the WfRecvMgr component is Online once it is started, functioning similar to other components. However no tasks are visible unless a recovered workflow instance is recovered after a server failure.
If you run list tasks for comp WfRecvMgr and no executed tasks are returned, it indicates there are no failed workflow process instances. This is expected behavior.
WfRecvMgr in an Active status indicates the WfRecvMgr component is up and running.
You can start the Workflow Recovery Manager in the Siebel client.
To start the Workflow Recovery Manager
- In the Server Manager command-line interface, issue the
enable compgrp workflow command.
This command makes sure the Workflow component group is activated on the server. For more information about Server Manager usage, see Siebel System Administration Guide.
- In the Siebel client, navigate to Administration-Server Management > Components.
- Query for Workflow Recovery Manager.
- If Workflow Recovery Manager is not Online, click the Start Up button.
This action starts the Workflow Recovery manager. It is not necessary to set other parameters.
Testing Workflow Recovery Manager Execution
You can test Workflow Recovery Manager execution.
To prepare for testing of Workflow Recovery Manager execution
- Connect to Server Manager from the Siebel Server.
You should see the
smgr> prompt. For more information about Server Manager usage, see Siebel System Administration Guide.
- Issue the following command and check whether the Component Group named Workflow is activated:
- Perform the following steps if the Workflow component group is not activated:
- Enter the following command to activate it:
Smgr> enable compgrp workflow
- Stop then restart the server.
- In the Siebel client, navigate to Administration-Workflow Process > Workflow Deployment.
- Make sure there is a workflow running that fails if the server is reset.
For Workflow Recovery Manager to attempt recovery, there must be an instance of an active, running workflow process that fails if it is interrupted. Peruse the Active Workflow Processes applet to make sure such a workflow is present.
To test Workflow Recovery Manager execution
- Open a Telnet session connected to the Siebel Server, issue the command
siebps, then make a note of the Process Id of the Siebel Server.
If you are using Windows, use Windows Task Manager instead of Telnet.
- Open a second Telnet session then connect to the Server Manager prompt.
- From the
smgr> prompt, issue the following command to start the execution of the workflow:
Start task for comp wfprocmgr with processname = 'AutoRec_NN'
A new task for the component
wfprocmgr with the task id is started.
- From the first telnet session you opened, check whether the file
Inloop.txt is created.
Note that numeric values are added to the file over time.
- Issue the following command to reset the Siebel Server:
Reset_server -e <enterprise> -M <server name>
This causes the workflow you are monitoring to fail.
- Restart the Siebel Server then enter the
siebps command as you did in Step 1.
The Siebel Server process is started again.
- Connect back to the
smgr> prompt then issue the following command to resume execution of the workflow from the point of failure:
smgr>start task for comp wfrecvmgr
A new task for the component wfrecvmgr is started with a new task id.
- From the Telnet session enter the following command to check the contents of the file:
tail -f inloop.txt
Note that numeric values are being added to the file until 2999 is reached.
- Issue the following command to check the last value entered on to the file:
The last and only value in this file should be 3000.
list tasks for comp WfRecvMgr.
Observe that there are now one or more tasks in the list. These are tasks Workflow Recovery Manager has taken to recover workflow processes that were interrupted when you reset the server in Step 5.
Recommended Workflow Recovery Techniques
It is recommended you follow certain techniques to make workflow recovery successful. These techniques include:
- Modify the Allow Retry Flag property on Siebel Operation steps and Business Service steps to reduce the number of checkpoints, and thereby minimizing run-time overhead. If the Recovery Manager cannot determine from which step the workflow instance must recover, then those instances are marked for manual recovery. Recovery is performed by the Recovery Manager based on the process instance's state information that is saved by the Workflow engine. The state information is saved at recovery checkpoints. For performance optimization, the recovery checkpoints are determined by the Workflow engine based on the nature of the step and the Allow Retry flag property on a workflow step.
- Break down a complex business service into a number of simpler business services. This makes it easier to individually recover each smaller business service.
- Use multiple recovery managers. Having multiple recovery managers provides a means to safe-guard against transient or permanent system problems. For example, such a condition can exist in an environment where the subnet in which the recovery manager resides is frequently down. However, it is typical to have only one recovery manager as long as the recovery manager itself can be automatically restarted upon server failure.
Recovering a Workflow Process
If the Workflow Process Manager server component fails, the workflow process automatically resumes the interrupted workflow instances when the server restarts. Recovery is performed by the Recovery Manager based on the process instance's state information that is saved by the Workflow engine.
You can recover an interrupted workflow process either automatically or manually.
Automatic Recovery of a Workflow Process Instance
If the Workflow Process Manager server component fails due to an event that occurs outside of the Workflow Process Manager server component, such as a server failure, Siebel workflow automatically resumes the interrupted workflow instances when the server restarts.
For a workflow process instance that cannot be automatically recovered, you can manually recover the process. For example, if the server fails in the middle of a Siebel operation to update a record, then the workflow is unable to determine if the Siebel Operation has finished. You might need to manually make sure the update was finished before resuming the workflow execution. In another case, if the Siebel operation queries a set of records, then even after the server fails, the workflow can be resumed automatically by requerying.
Automatic recovery of a workflow process applies to a workflow process that runs in the server component. A workflow process running on a local database cannot be recovered.
Manual Recovery of a Workflow Process Instance
You can correct and resume a workflow process instance that has encountered errors. For example, if the Communications Server is not available, a workflow process sending an email notification will have a status of In Error. You can activate the Communication Server component, then resume the workflow process.
Instances marked for manual recovery are recoverable from the Workflow Instance Admin view. Navigate to Administration-Business Process > Workflow Instance Admin > Related Instances, then choose an option from the applet menu. Options for manual recovery include:
- Resume Instance-Next Step. The process instance skips the current step, and evaluates the branching conditions coming from the current step to determine the next step to execute.
- Resume Instance-Current Step. The process instance resumes from the current step. The current step is retried. If the current step is a Sub Process step, a new subprocess instance is started.
To manually recover a workflow process instance, you use the Workflow Instance Admin view. For more information, see Workflow Instance Admin View.
Resuming a Workflow Process Instance
A workflow process instance can resume only if the workflow's Call Depth setting is the highest among the workflow's related process instances. Resume Instance-Next Step and Resume Instance-Current Step are disabled if the workflow instance is part of a set of related instance and one or more of those other instances has a higher Call Depth level.
For example, assume there are multiple related instances with Call Depth settings of level 0,1,2,3, and 4. Assume you choose the record with level 3. In this case, these Resume controls are disabled because level 3 is not the highest Call Depth level set among the related instances.