6 Recovering From Faults in the Error Hospital

This chapter describes how to recover from faults in the error hospital, including specifying and saving fault search criteria, viewing aggregated fault statistics, performing bulk fault recoveries and bulk fault terminations in a single operation, accessing recoverable faults to perform a single fault recovery, and creating error notification rules that trigger the sending of alert messages when specific fault criteria are met.

This chapter includes the following sections:

For information about tracking the status of business flow instances, see Tracking Business Flow Instances .

6.1 Managing Faults in the Error Hospital

You can manage all faults occurring within Oracle SOA Suite and view aggregated statistics associated with faults data on the Error Hospital page.

The Error Hospital page provides the following benefits:

  • A single location for managing and recovering from all aggregated faults occurring within Oracle SOA Suite (including rejected message recovery and BPEL message recovery). Regardless of the service engine or binding component in which the fault occurred, you manage faults from the Error Hospital page at the following levels:

    • At the SOA Infrastructure level, where all system-wide faults data is aggregated for each business flow instance.

    • At the individual partition level, where only faults data for the business flow instances associated with that specific partition is aggregated.

  • Error notification rules configuration for triggering an alert when specific fault criteria are met. For example, you define a rule to trigger an alert if more than 10 errors occur in a 48 hour period.

  • Fault filtering and searching capabilities, and the ability to aggregate fault statistics by name, code, type, owner, and other grouping criteria.

  • Bulk fault recovery and termination capabilities.

  • Details of flow instances associated with the aggregated faults for examining fault trends.

To manage faults in the error hospital:

Access this page through one of the following options:

To access faults in all partitions:

From the SOA Infrastructure Menu... From the SOA Folder in the Navigator... From the SOA Composite Menu...
  1. Select Home > Error Hospital.

  1. Expand SOA > soa-infra.

  2. Click soa-infra (server_name).

  3. Click the Error Hospital tab.

  1. Select SOA Infrastructure.

  2. Click the Error Hospital tab.

To access faults in an individual partition:

From the SOA Infrastructure Menu... From the SOA Folder in the Navigator...
  1. Select Manage Partitions.

  2. In the SOA Partition column, select a specific partition.

  3. Click the Error Hospital tab.

  1. Expand SOA > soa-infra (server_name).

  2. Select a specific partition.

  3. Click the Error Hospital tab.

The Error Hospital page displays the following details:

  • A utility for specifying and saving comprehensive fault search criteria and clicking Search.

    Note:

    When you initially access the Error Hospital page, the Fault Statistics table is empty. You must click Search to populate this table with fault details.

  • A Fault Statistics table that provides the fault name, total number of faults, faults requiring recovery, unrecoverable faults, recovered faults, and automatic fault retries. Click a number to search for flow instances associated with the aggregated faults (this takes you to the Flow Instances page). To display a different fault attribute in the first column of the table (such as fault name, code, type, owner, and other grouping criteria), select the Group By list. To display additional columns in the table, select View > Columns.

  • Bulk Recovery and Bulk Abort buttons above the Fault Statistics table for performing bulk actions (recovery or abort) on a selected group of similar faults in a single operation.

    Note:

    • When you click a faults link or similar links elsewhere in Oracle Enterprise Manager Fusion Middleware Control, you are taken to the Error Hospital page with the fault report data already displayed. For example, when you click the Error Hospital button above the Search Results table in which business flow instances are displayed on the Flow Instances page, you see the aggregated fault statistics reported for those flow instances. In addition, when you click a specific fault state in the graph in the Business Transaction Faults section of the Dashboard page, you are taken to the Error Hospital page with the fault report data of the selected state already displayed.

    • Report data is delimited by the time period for which instances and faults are retrieved. The current delimiter is displayed to the right of the Fault Statistics table title. The default value is 24 hours. You can change this value with the Default Query Duration property on the SOA Infrastructure Common Properties page. For information, see Configuring the Audit Trail, Payload Validation, and Default Query Duration.

You can perform the following fault management tasks:

6.1.1 Specifying and Saving Fault Search Criteria

The Report Filters section enables you to specify and save comprehensive fault search criteria. Search results are displayed in the Fault Statistics table.

To specify and save fault search criteria:

  1. Click the Search icon to display the Report Filters section. The Report Filters section displays in a sliding panel and may not be visible in the page at all times.
  2. See the following sections to specify and save comprehensive fault search criteria.
6.1.1.1 Executing Predefined Fault Instance and Custom Searches

You can quickly find faults without entering any search criteria by selecting a predefined search option. Results are displayed in the Fault Statistics table. The searches are constrained by a predefined time period. The default time period is 24 hours. This value can be changed by modifying the Default Query Duration property in the SOA Infrastructure Common Properties page, accessible under SOA Administration in the SOA Infrastructure menu.

The following options are available:

  • Instances With Faults: Displays recent instances that have faults. This predefined search option is also available on the Flow Instances page, where you can select it from the Search Options list or click the Instances With Faults link.

  • All Saved Searches: Displays custom searches you have created and saved. Saved searches are also displayed in the Search region of the Dashboard page.

To execute predefined or custom fault instance searches:

At the top of the Search Options section, select the option for which to search.

The search results are displayed in the Fault Statistics table.

For more information about predefined fault instance searches, select Help > Help for This Page from the weblogic main menu on the Error Hospital page.

For information about saved searches, see Using the Report Filters Toolbar.

6.1.1.2 Using the Report Filters Toolbar

The Report Filters toolbar enables you to perform search-related tasks, such as resetting displayed fault search filter criteria, saving fault search filter criteria, and bookmarking searches. By default, only predefined searches can be invoked. You can extend the list of available searches by saving custom searches. The Report Filters toolbar displays in a sliding panel and may not be visible in the page at all times. If not already open, you can invoke it by clicking the large Search Options icon.

To use the Report Filters toolbar.

Go the toolbar in the Report Filters section.

The following options are available.

Element Description
Reset icon

Click to reset the search fields in the currently invoked saved search to the last saved values. This is useful when you have modified a saved search and want to restart the query building process.

Save icon

Click to save your current search criteria. This saves both the selected search fields and their values, enabling you to run the identical search at a later time and view a fresh set of results. Searches are saved per user, and not globally. For example, user A cannot log in to Oracle Enterprise Manager Fusion Middleware Control and access the saved search criteria of user B.

You must provide a name when saving a search. You cannot overwrite an existing saved search, but you can save it with a different name. You can delete the saved searches you created. To manage your saved searches, select All Saved Searches from the Report Filters list.

Bookmark icon

Click to bookmark your current search criteria. A message is displayed with a URL containing the search parameters. Copy the URL to a browser bookmark window, email, or chat. The generated URL includes information about both the selected search fields and their values. This enables you to run the identical search at a later time and view a fresh set of results.

For more information about the Report Filters toolbar, select Help > Help for This Page from the weblogic main menu on the Error Hospital page.

6.1.1.3 Configuring and Saving Fault Search Filter Criteria

You can configure parameters for each search filter to create a fault search query. Search results are displayed in the Fault Statistics table.

To configure and save fault search filter criteria:

  1. Go to the Report Filters section.
  2. Specify search criteria. If you want to further customize fault search criteria, click More next to the Fault filter to display additional configuration fields such as fault owner, recovery type, fault type, and fault details (error message contents, fault name, fault code, HTTP host, and JNDI name).
  3. Configure parameters for appropriate search filters. Filters left blank are ignored. You do not need to remove them. For more information about configuring each filter, select Help > Help for This Page from the weblogic main menu on the Error Hospital page.
    Element Description

    Time

    Use this filter to restrict your query to a specific time in the past. A time filter is required to search for faults. Ensure that you select values from both lists. For example, select 2 and Weeks to restrict your query to the last two weeks.

    Click Options to display the filters available for selection. Once selected, you can specify the time period to search.

    • Instance Created (displays by default and cannot be deselected)

    • Instance Updated

    • Fault Occurred (displays by default, but you can deselect it)

    You can also specify a custom time period for which to search by selecting the Custom time period checkbox.

    Composite

    Restrict your search query for faults to a specific composite.

    • If searching at the partition level, only faults that were initiated and participated in by SOA composite applications in that partition are returned.

    • If searching at the SOA Infrastructure level, faults initiated or participated in by SOA composite applications in any partition are returned.

    Perform the following tasks:

    • Select Initiating if you want to limit your search only to the faults that started in the selected composite. To search for all faults in that composite, select Participating.

    • Select the partition to search. If you access the Error Hospital page at the individual partition level, that partition is already selected and you cannot change it. If you do not select a partition at the SOA Infrastructure level, all are searched.

    • Select the specific SOA composite application name from the list or click Search to specify a complete or partial name for which to search. The partial name search supports only entering the beginning part of the name and is case sensitive. If you do not specify a composite, all are searched.

    Resequencer

    Select the resequencing groups for which to search. Use this filter to limit your search only to business flows in which a resequenced component participated. If you leave this section blank, this search filter is ignored. The resequencer in Oracle Mediator rearranges a stream of related but out-of-sequence messages into a sequential order.

    • Any Group: Select to search for faults in all resequenced flows in all groups.

    • Specific Group: Select to find faults associated with a specific resequencing group. Specify the group's name and location. The location is the Oracle Mediator service component and SOA composite application revision containing the group. The group name filter returns only one group instance.

    For more information about resequencing, see Monitoring Resequencing Groups and Resequencing in Oracle Mediator in Developing SOA Applications with Oracle SOA Suite.

    State

    Select Active to search active instances. Active instances have not reached one of the terminal states. The list is refreshed to display the following selections for further filtering:

    • All active: Finds all business flows in nonterminal states.

    • Recovery: A business flow with a recoverable fault (recoverable faults, recovered faults, and system automatic retries are all included in this category). Use the Fault filter to further specify a particular type of recovery, such as selecting Recovery Required from the Fault list, clicking More, and selecting an option from the Fault Recovery Type list.

    • Suspended: A business flow that is typically related to migration of one version of the SOA composite application to another.

    • Running: A business flow is currently running. The flow may include a human task component that is currently awaiting approval.

    Select Inactive to search inactive instances. Inactive instances have reached one of the terminal states. The list is refreshed to display the following selections for further filtering:

    • All inactive: Finds all terminated business flows.

    • Completed: A business flow has completed successfully. There are no faults awaiting recovery.

    • Failed: Finds completed business flows with nonrecoverable faults. Use the Fault filter to further filter based on fault type and other fault details. For example, if State is set to Failed, Nonrecoverable or All Faults can be used.

    • Aborted: Finds business flows explicitly terminated by the user or for which there was a system error.

    If you leave the State field blank, the State filter is ignored.

    Fault

    Restrict the search for business flows to only those with faults. If you leave this field blank, this filter is ignored. Select to specify the types of faults for which to search.

    • All Faults: Select to search for business flows containing any type of fault.

    • Recovery Required: Select to search for stuck flows awaiting a human recovery action to proceed. To further specify a particular type of recovery, use the Fault Recovery Type filter.

    • Nonrecoverable: Select to search for flows containing nonrecoverable faults. Nonrecoverable faults includes aborted and failed instances.

    • Recovered: Select to search for flows that contain at least one recovered fault.

    • System Auto Retries: Select to find the faulted flows in which system automatic retries occurred. This applies only to fault policy-configured automatic retries.

    To further customize fault filtering, click More next to Faults to display additional filtering attributes.

    • Fault Recovery Type: Filter your search for faulted business flows to stuck flows awaiting a particular type of recovery action. This field is available when you select Recovery Required in the Fault filter and click More. If you leave this field blank, the Fault Recovery Type filter is ignored. The fault recovery types are as follows:

      Admin Recovery

      BPEL Activity Recovery

      BPEL Invoke Message Recovery

      BPEL Callback Message Recovery

      EDN Recovery

      Mediator Recovery

      Human Workflow Recovery

      Rejected Message Recovery

    • Fault Type: Filter your search for faulted business flows. If you leave this field blank, the Fault Type filter is ignored. Select one of the following to restrict your search to only the flows containing that fault type:

      System: Network errors or errors such as a database server or web service being unreachable.

      Business: Application-specific faults that are generated when there is a problem with the information being processed (for example, a social security number is not found in the database).

      OWSM: Errors on Oracle Web Service Manager (OWSM) policies attached to SOA composite applications, service components, or binding components. Policies apply security to the delivery of messages.

    • Fault Owner: Select the specific component, service, or reference in which the fault was handled (also known as the fault owner). Use this filter to further narrow down your search for faulted business flows. If you leave it blank, the Fault Owner filter is ignored. The fault owner is similar to the fault location, but they are not always the same. For more information about this filter, see the online help for the Error Hospital page.

    • Fault Details: Filter a search for faulted business flows. If you leave all fields blank, the Fault Details filter is ignored. Specify at least one of the following details about the fault. To find only faults for which these values are not set, enter NOT SPECIFIED in the search field.

      Error Message Contains: Use to find only faulted business flows with the same error message text. You can enter any part of the message. This search is case sensitive.

      Fault Name: Use to find only faulted business flows with a specific descriptive fault name such as NegativeCredit. You must enter the exact name (the entire string). This search is case sensitive.

      Fault Code: Use to find only faulted business flows with the same fault code.

      To further customize fault search criteria, click More next to Fault Details to display configuration fields such as HTTP host, JNDI name, event name, and event namespace.

  4. If you want to save search criteria for future use, click the Save Search icon to specify a name.

    Your saved search is then available for selection in the Report Filters dropdown list and the Search Options section of the Flow Instances page.

  5. When search criteria creation is complete, click Search.

    View search results in the Fault Statistics table.

6.1.2 Viewing Aggregated Fault Statistics to Examine Fault Trends

The Fault Statistics table displays a report on faults data specified and created in either of the following ways:

  • Specified and created in the Report Filters section of the Error Hospital page.

  • Specified and created in the Search Options section and displayed in the Search Results table of the Flow Instances page, and then displayed in the Fault Statistics table by clicking the Error Hospital link above the Search Results table.

The data is always aggregated by one of the primary fault attributes selected from the Group By list, such as Fault Name, Fault Code, and so on. The default aggregation is by Fault Name.

The Error Hospital page does not show individual faulted instances. To track individual business flows that have faults, perform one of the following tasks:

  • Go to the Flow Instances page and click Instances With Faults.

  • Click a fault count in the Fault Statistics table of the Error Hospital page to access details about that fault in the Search Results table of the Flow Instances page.

The Fault Statistics table enables you to examine fault trends (such as for diagnostic purposes). For example, aggregate by Fault Code to see which code has the most faults. You can also perform bulk actions (recovery or abort) on a selected group of similar faults in a single operation.

6.1.2.1 To View Aggregated Fault Statistics to Examine Fault Trends:
  1. Specify search criteria in the Report Filters section as described in Specifying and Saving Fault Search Criteria, and click Search.

    The Fault Statistics table is populated with details about faults. This represents the total number of faults, faults requiring recovery, unrecoverable faults, recovered faults, and automatic fault retries.

    The legend above the Fault Statistics table displays the color symbols used in the columns of the table to identify the state of faults.

    State Description

    Nonrecoverable

    Displays the total count of nonrecoverable faults. This includes failed and aborted faults.

    Clicking the value in the column takes you to the Search Results table on the Flow Instances page and filters the list to show the flow instances associated with nonrecoverable faults. Terminal (fatal) faults cannot be recovered.

    Recovery Required

    Displays the total count of recoverable faults.

    Clicking the value in the column takes you to the Search Results table on the Flow Instances page and filters the list to show the flow instances associated with recoverable faults. These are faults awaiting a human recovery action so that stuck flows can proceed.

    Recovered

    Displays the total count of recovered faults.

    Clicking the value in the column takes you to the Search Results table on the Flow Instances page and filters the list to show flow instances associated with recovered faults. These are recoverable faults on which a recovery action was performed successfully. Processing has resumed in the business flow instance.

    System Auto Retries

    Displays the total count of faults that are automatically retried by the system.

    Clicking the value in the column takes you to the Search Results table on the Flow Instances page and filters the list to show the flow instances associated with these system retried faults.

  2. From the Group By list above the Fault Statistics table, select the fault attribute by which to aggregate data. Fault Name is the default aggregation field.

    The following options are available:

    Element Description

    Fault Name

    Aggregates by the fault name. This aggregation option is selected by default.

    Fault Code

    Aggregates by the fault code.

    Fault Type

    Aggregates by the fault type:

    • System: Network errors or errors such as a database server or web service being unreachable.

    • Business: Application-specific faults generated when there is a problem with the information being processed (for example, a social security number is not found in the database).

    • OWSM: Errors on OWSM policies attached to SOA composite applications, service components, or binding components. Policies apply security to message delivery.

    Composite

    Aggregates by the SOA composite application name.

    Partition

    Aggregates by the partition of the SOA composite application in which the fault occurred.

    Fault Owner

    Aggregates by the name of the service component, service binding component, or reference binding component that handled the fault. In some cases, this can be both the fault owner and fault location.

    Fault Owner Type

    Aggregates by the type of service component, service binding component, or reference binding component that handled the fault (for example, if a BPEL process service component owns the fault, BPEL is displayed).

    JNDI Name

    Aggregates by the JNDI name (for example, eis/FileAdapter).

    HTTP Host

    Aggregates by the HTTP host on which the fault occurred.

  3. If you select Fault Code, each row in the first column represents a specific code and the remaining columns show the fault statistics aggregated for each code. Regardless of your selection, the remaining rows in the table always show the total number of faults; the number of recoverable, nonrecoverable, and currently recovered faults; and the number of automatic retries performed after a fault occurred.

  4. If you select Fault Type, each row in the first column represents a specific fault type and the remaining columns show the fault statistics aggregated for each type. As with all selections in the list, you can click the total, recoverable, and recovered numbers that are displayed to access the Flow Instances page for performing fault recovery actions.

  5. If you select Composite, each row in the first column represents a specific SOA composite application name and the remaining columns show the fault statistics aggregated for each composite.

6.1.3 Performing Bulk Fault Recoveries and Terminations in a Single Operation

You can perform bulk fault recoveries and bulk fault terminations on any aggregated fault row in the Fault Statistics table that has recoverable faults. Options for performing these actions are displayed above the Fault Statistics table.

To perform bulk fault recoveries and terminations:

  1. Specify search criteria in the Report Filters section as described in Specifying and Saving Fault Search Criteria, and click Search.

    The Fault Statistics table is populated with details about faults.

  2. Select a row in the table in which the Recovery Required column has a value of more than one. Note that you can also recover single instances through this option.
  3. Click an appropriate action above the table (for this example, Bulk Recovery is selected). You can also right-click a row to display the same actions.
    Element Description

    Bulk Recovery

    Select this option to perform a bulk recovery. Only rows with faults identified as recoverable can be recovered in bulk.

    Bulk Abort

    Select this option to perform a bulk termination. Rows with faults identified as recoverable and rows with faults in the System Auto Retried column can be terminated in bulk.

    This attempts a recovery on all recoverable faults associated with that aggregated row. For this example, the selected row includes three faults that require recovery in the Recovery Required column.

  4. Click Yes when prompted to continue with the bulk recovery. You can also expand the Schedule Properties and Throttling Properties sections to display recovery details.

    Note:

    Scheduling and throttling properties are not applicable for non-Oracle Enterprise Scheduler setups. You can only schedule bulk recoveries with Oracle Enterprise Scheduler. Otherwise, this dialog does not include Schedule and Throttling properties, and bulk recovery is attempted immediately. The bulk recovery job status link for non-Oracle Enterprise Scheduler setup takes you to the log viewer page where you can see the logs corresponding to bulk recovery execution.

    A message is displayed indicating that recovery is in progress. If Oracle Enterprise Scheduler is deployed, you can click the link in the message to access the Request Details page of Oracle Enterprise Scheduler. If Oracle Enterprise Scheduler is not deployed, clicking the job ID invokes the Log Messages page.

    Note the following details about the Oracle Enterprise Scheduler job request number and fault alert message:

    • You can search for the job request number by clicking Bulk Recovery Jobs in the Search section of the Dashboard page and specifying the number in the Job Request ID field. Click the ID to go to the Oracle Enterprise Scheduler Job Request page. For more information, see Searching for Instances and Bulk Recovery Jobs.

    • When an alert message is triggered, the name is displayed in the Fault Alerts section of the Dashboard page. For more information, see Viewing Error Notification Alerts.

  5. Click OK to acknowledge that the bulk recovery job is being handled through an Oracle Enterprise Scheduler job request number.
  6. At the top of the Error Hospital page, click the Refresh icon.

    If fault recovery was successful, the number of recovered faults that are displayed in the Recovered column of the Fault Statistics table is increased (for this example, by three).

  7. In the Recovered column, click the number.

    The Flow Instances page is displayed. The business flow instance that previously was displayed as Recovery in the State column is now displayed as Completed in the Search Results table.

  8. Select the row that includes the business flow instance in the Search Results table, and click Show Details.

    The Faults tab is displayed at the bottom of the Flow Instances page. In the Recovery column, the fault status is displayed as Recovered.

6.1.3.1 Using Additional Bulk Recovery Options for BPEL Processes

Depending on your fault policies, BPEL faults provide additional recovery options like Replay, Rethrow, and Continue. You can use these additional recovery options when bulk-recovering your BPEL faults.

In order to use BPEL-specific recovery options, use the following steps to filter BPEL faults in Error Hospital:
  1. Under the Group By field, select Fault Owner Type.

    This option groups your search results by fault owner type, such as BPEL and Mediator.

  2. Select the BPEL row, and click Bulk Recovery Options for the fault recovery options.

    The BPEL Bulk Recovery Options menu appears.

  3. Select the appropriate bulk recovery action, such as Replay, Continue, or Rethrow, to continue with the bulk recovery.

    See Recovering from Faults in a Business Flow Instance for more details on the individual BPEL recovery options.

6.1.4 Accessing Faults in the Fault Statistics Table to Perform Single Fault Recovery Operations

The Error Hospital page does not show individual faulted instances. However, you can click a fault count in the Fault Statistics table of the Error Hospital page to access that fault for performing single fault recovery operations in the Search Results table of the Flow Instances page.

To access faults in the Faults Statistics table to perform single fault recovery:

  1. Specify search criteria in the Report Filters section as described in Specifying and Saving Fault Search Criteria, and click Search. For example, select Recovery Required in the Fault filter.

    The Fault Statistics table is populated with details about faults that require recovery.

  2. In the Recovery Required column, click the number of faults requiring recovery for a specific fault name. The number can be a value of more than one. Note that you can also recover single instances through this option.

    You are taken to the Search Results table of the Flow Instances page.

    The faults requiring recovery are displayed. If you instead clicked the value in a different column (for example, the Total Faults or Recovered column in the Fault Statistics table), results appropriate to that selection are displayed.

  3. Select a specific row in the Search Results table, and click Show Details.

    The page is refreshed to display the Faults (selected), Sensor Values, Composites, and (if resequencing groups are included in the composite) Resequencing Groups tabs below the Search Results table. Each tab describes specific details about the flow.

  4. To perform fault recovery actions from the Faults tab, see Step 4 of Recovering from Faults in a Business Flow Instance.

6.1.5 Understanding Additional Message and Fault Recovery Behavior Scenarios

This section describes additional fault message behavior issues on the Error Hospital page.

6.1.5.1 Recoverable Messages are Displayed as Unrecoverable in the Error Hospital

When message delivery fails on one node (the managed server) of a cluster, undelivered messages are displayed as follows:

  • Unrecoverable on the Error Hospital page

  • Recoverable on the BPEL process service engine Recovery page

This occurs when BPEL process invoke activities are processing during a server shutdown. These activities may not complete, even if a graceful shutdown occurs. In these cases, the instances are shown as running and unrecoverable on the Error Hospital page because the BPEL process service engine cannot update the business flow state during a server shutdown.

You can manually recover the BPEL invoke activities on the BPEL process service engine Recovery page. Otherwise, they are recovered during automatic recovery.

For more information, see Performing BPEL Process Service Engine Message Recovery.

6.1.5.2 Unrecoverable Binding Component Faults are Displayed as Recoverable

A FabricInvocationException.RetryType.NO_RETRY error returned by a database adapter reference binding component is treated as a binding fault. Even though the fault is nonretriable, the following is displayed:

  • There is a recoverable message on the BPEL process service engine Recovery page.

  • The flow state is displayed as recoverable because of the message in the BPEL process invoke activity recovery queue.

This is the expected behavior. In 12c, common faults and BPEL process messages are linked together. This means the fault and flow state both indicate that an invoke activity recovery is required.

For more information, see Performing BPEL Process Service Engine Message Recovery.

6.1.5.3 BPEL Process Messages Awaiting Recovery with no Associated Instance Faults Do Not Appear on the Error Hospital Page

If messages are awaiting recovery on the BPEL process service engine Recovery page and there is no associated fault with the instance, this is not shown on the Error Hospital page. This can occur in the following scenarios:

  • If a callback message arrives late and the instance has already completed.

  • If a race condition occurs when using message aggregation with reenableAggregationOnComplete=true. When messages are sent around the same time, most of them are marked as midprocess receive messages and there are no new instances to pick them up.

For more information about message aggregation, see "Routing Messages to the Same Instance" of Developing SOA Applications with Oracle SOA Suite.

6.2 Creating Error Notification Rules

You can create error notification rules at the SOA Infrastructure or individual partition level that cause an alert message to be triggered when specific fault criteria are met. For example, you can create a rule that sends an alert if more than 10 errors occur in a 48 hour period. You can configure the alert to be sent to the Fault Alerts section of the Dashboard page described in Viewing Error Notification Alerts. and also to a delivery channel such as an email address.

Note:

To create error notification rules, Oracle Enterprise Scheduler must be deployed to the SOA Infrastructure. If Oracle Enterprise Scheduler is not deployed, you cannot access this page.

The error notification rules provide the following benefits:

  • An aggregated notification of faults occurring in the system.

  • A scheduled-based notification system with a configurable reoccurrence interval. For example, send an alert every 24 hours if rule criteria are met.

  • Rule-configured faults and notification channel specifications. When a fault policy is triggered, an email is sent.

You can create fault notification rules at the following levels:

  • SOA Infrastructure (for system-wide alerts)

  • Individual partition level (for alerts specific to that partition)

The following roles are required for creating, updating, and deleting rules:

  • partition_nameApplicationOperator: This role is partition-specific. A user in this partition-specific role has the permissions to manage alerts for that partition.

  • MiddlewareOperator

  • MiddlewareAdministrator

  • SOAAdmin

  • SOAOperator

For more information, see Securing Access to Partitions.

Note the following details about the display of rules in Oracle Enterprise Manager Fusion Middleware Control:

  • Rules created at the SOA Infrastructure (system-wide) level are not displayed in the Error Notification Rules page at the individual partition level.

  • Rules created at the individual partition level are not displayed in the Error Notification Rules page at the SOA Infrastructure (system-wide) level.

The Fault Alerts section of the SOA Infrastructure Dashboard page shows all system-wide alerts, including all partitions.

6.2.1 To create error notification rules:

  1. To receive an alert notification when an error occurs, you must specify the address of the user and the delivery channel to use (email, IM, or SMS). Those tasks are performed on different pages in Oracle Enterprise Manager Fusion Middleware Control,

    For This Delivery Channel... Perform These Tasks...

    Email

    1. Configure the email addresses on the Workflow Notification Properties page.

      See Configuring Human Workflow Notification Properties.

    2. When complete, click Go to the Messaging Driver page on the Workflow Notification Properties page.

    3. Configure the email driver on the User Messaging Service page.

      See Configuring the Email Driver in Administering Oracle User Messaging Service.

    SMS

    1. Configure the Short Message Peer-to-Peer (SMPP) driver on the User Messaging Service page.

      See Configuring the SMPP Driver in Administering Oracle User Messaging Service.

    IM

    1. Configure the Extensible Messaging and Presence Protocol (XMPP) on the User Messaging Service page.

      See Configuring the XMPP Driver in Administering Oracle User Messaging Service.

  2. Create an alert at the appropriate level:

    To create error notification rules at the SOA Infrastructure level:

    From the SOA Infrastructure Menu... From the SOA Folder in the Navigator...
    1. Select Error Notification Rules.

    1. Expand SOA.

    2. Right-click soa-infra (server_name).

    3. Select Error Notification Rules.

    To create error notification rules at the individual partition level:

    From the SOA Partition Menu of a Specific Partition... From the SOA Folder in the Navigator...
    1. Select Error Notification Rules.

    1. Right-click a specific partition.

    2. Select Error Notification Rules.

    The Error Notification Rules page displays the following details:

    • An Error Notification Rules table for viewing existing rules and details about each rule. Select one or more rules to manage.

    • Links for creating a new rule, creating a new rule from an existing rule, editing a rule, deleting a rule, disabling a rule, and searching for a rule. For more information, click the weblogic icon and select Help > Help for This Page on the Error Notification Rules page.

  3. Create a new rule in either of the following ways:

    1. Click Create to create a new rule.

    or

    1. Click Create Like to create a new rule from a selected rule.

  4. Enter the following information.

    Element Description

    Name

    Enter a name for the rule. Once the new rule is saved, the name cannot be changed. This name is also used for alerts that display on the Dashboard page or which are sent to the notification recipients through a channel such email, SMS, or instant messaging (IM).

    Description

    Enter a description for the rule. This description is visible only to administrators. An end user receiving fault notification alerts or viewing alerts on the Dashboard page cannot see this description.

    Schedule Names

    Select a predefined schedule. This indicates how often to trigger the scheduler (for example, invoke the scheduler every two minutes). When you select a schedule, the page is refreshed to display the Schedule Description and Frequency fields.

    You define the schedule names in the Create Schedule page of Oracle Enterprise Manager Fusion Middleware Control.

    1. In the Navigator, expand Scheduling Services > ESSAPP or right-click soa-infra (server_name) and select Define Schedules.

    2. From the Scheduling Service menu, select Job Requests > Define Schedules.

      The schedules available for selection in the Schedule Names list are displayed.

    3. Click Create to create additional schedules and their execution frequency.

      Note: While defining a schedule name, ensure that you specify the schedule package name of /oracle/apps/ess/custom/soa. Otherwise, the schedule is created, but is not accessible on the Create or Edit Error Notification Rule page.

    For more information about using the Oracle Enterprise Scheduler in Oracle Enterprise Manager Fusion Middleware Control, see Administering Oracle Enterprise Scheduler.

    Description.

    Displays the schedule description configured on the Create Schedule page.

    Frequency

    Displays the schedule frequency configured on the Create Schedule page.

  5. Use the IF-THEN table to define the fault notification rule, and click Apply.

    Element Description

    IF

    Define the IF part of the rule. At least one rule condition is mandatory, and cannot be removed.

    • At the SOA Infrastructure level, the mandatory parameter is:

      Fault Occurred in Last 48 Hours

    • At the individual partition level, the mandatory parameters are:

      Fault Occurred in Last 48 Hours

      Partition is partition_name

    You can edit the default value of 48.

    Additional rule conditions are optional. Each condition can be added only once. Once a condition is added, it is removed from the list of available conditions.

    Click the + sign to select rule conditions and assign values. For example, define a rule to trigger an alert if more than 3 faults occur in a 48 hour period in the default partition.

    IF Fault Occurred in Last 48 Hours and

    Partition is default and

    Fault Count is over 3

    THEN

    Define the THEN part of the rule. Any number of THEN conditions can be specified. At least one condition is required. (Send Alerts to Dashboard is a valid condition.)

    • Send Alerts to Dashboard

    Select whether to send an alert to the Fault Alerts section of the Dashboard pages at the SOA Infrastructure or partition levels when the specified fault criteria are met. Use this selection with care to prevent the Dashboard page from overflowing with fault alerts. If you do not select this option, the alert is not displayed on the Dashboard pages.

    • Send Message To User Via Delivery_Channel

    Specify the address of the user to receive the alert notification and the delivery channel to use (email, IM, or SMS). Click the - sign to remove the users. It is your responsibility to ensure that the user contact information you enter is correct.

    Note: You must also configure the notification email properties on the Workflow Notification Properties page, as described in Configuring Human Workflow Notification Properties. The delivery channels must also be configured in the Oracle UMS Adapter, which is accessible from the Workflow Notification Properties page by clicking the Go to the Messaging Driver page link.

    The notification message the alert recipients receive provides the following details. The message content cannot be configured.

    • Fault information. For example:

      16 faults occurred in the last 48 hours

    • A link to the Error Hospital page for viewing details about the faults in this notification alert. From the Error Hospital page, you can drill down to see the individual flow instances and further details about the faults.

    For information about configuring delivery channels in Oracle UMS Adapter, see Administering Oracle User Messaging Service.

    When complete, alert notification rule design looks as follows.

    By default, the alert is enabled. You can disable the alert by selecting the alert on the Edit Notification Rules page and clicking Disable. This button acts as a toggle for enabling or disabling one or more selected alerts.

    When error notification rule criteria are met, the alert is triggered and displayed in the Fault Alerts section of the Dashboard page at the SOA Infrastructure or partition level. The frequency with which a rule is invoked is based upon your selection from the Schedule Names list in Step 5.

    1. Click the link that identifies the number of faults.

      The Error Hospital page is displayed.

    2. Click Search.

      The Fault Statistics table shows details about the faults and the Fault Occurred field of the Time filter of the Report Options section is populated with the same time period specified on the Create Error Notification Rules page.

    3. In the Recoverable column, click the values to perform fault recovery. For more information, see Viewing Error Notification Alerts.

  6. When you receive an error notification alert (for example, an email), click the link in the email to access the Error Hospital page.

    16 Faults occurred in the last 48 hours
     Click the link for more details http://link_to_Error_Hospital_Page
    

For information about assigning alerts in the fault management framework in Oracle JDeveloper, see How to Design a Fault Policy with the Fault Policy Wizard in Developing SOA Applications with Oracle SOA Suite.

For information about roles, see Securing Access to SOA Folders.

6.2.2 Error Notification Rules Associated with an Expired Schedule

You cannot enable, disable, or delete a rule when the schedule associated with the rule has expired. The following error message appears:

<Error> <oracle.soa.scheduler> <BEA-000000> <ESS-01054 Cannot hold request 5.
Current state is Finished.
oracle.as.scheduler.IllegalStateException: ESS-01054 Cannot hold request 5.
Current state is Finished.
at weblogic.rmi.internal.ServerRequest.sendReceive(ServerRequest.java:258)
at
weblogic.rmi.cluster.ClusterableRemoteRef.invoke(ClusterableRemoteRef.java:472
)
at 

These actions can be performed if the rule has an active schedule.