4 Monitoring Oracle Service Bus Alerts
This chapter includes the following topics:
4.1 Introduction to Oracle Service Bus Alerts
Oracle Service Bus lets you define two different types of alerts for service components: service level agreement (SLA) alerts and pipeline alerts. For both types of alerts, you can specify alert destinations, such as email addresses and JMS queues.
You define SLA alert rules in the Oracle Service Bus Console, and you define pipeline alert rules in either the Oracle Service Bus Console or JDeveloper. The following figure shows the Service Bus Service Health page, with a list of services that have generated alerts.
4.1.1 Alerts on the Service Bus Dashboard
In Fusion Middleware Control, you can monitor domain-wide SLA and pipeline alerts on the Service Bus Dashboard page. This page displays information about all alerts that occurred on the domain within the specified interval or since the last time the statistics were reset. The Dashboard includes the following information:
-
A pie chart illustrating the breakdown of alerts by severity for the specified period.
-
The top 10 services with the specified type of alert in the current aggregation interval, listed in descending order.
-
A table that lists and describes the alerts represented by the pie chart.
-
A table that lists the services with the most errors.
The alerts listed on the page are the alerts that are represented in the pie chart. You can click on the name of an alert or service in any of the tables on this page to view more information, or click on a section of the pie chart to view additional information about alerts of the specified severity.
Alerts can be sent to multiple alert destinations, including email addresses, JMS queues, and SNMP traps. The destinations for an alert are defined in an alert destination resource, which is associated with the alert in Service Bus. For more information about alert destinations, see "Working with Alert Destinations" in Developing Services with Oracle Service Bus.
4.1.2 Alerts and Operational Settings
Alerts are only generated if alerting and monitoring are enabled for the Service Bus domain. For SLA alerts, SLA alerting must be enabled for both the individual service and the domain. For pipeline alerts, pipeline alerting must be enabled for both the individual pipeline and the domain. For more information about operational settings, see Configuring Operational and Global Settings.
4.2 About Service Level Agreement Alerts
The purpose of SLA alerts is to inform the operations team of issues relating to the health of Service Bus services or to the quality of service provided.
SLA alert rules trigger alerts for proxy services, business services, pipelines, and split-joins based on the conditions you define for each service. You can configure these alerts when you create Service Bus resources for a project. When you create an alert rule, you define the name, description, summary, duration, severity, frequency, and state of the alert rule. You also define one or more conditions that trigger an alert based on the rule. Conditions can include message or error counts, response times, failure or success ratios, and endpoint URI status.
SLA alerts are automated responses to SLA alert rules violations and are displayed on the Dashboard and the Alert History page. You define alert rules to specify unacceptable service performance according to your business and performance requirements. When you create an SLA alert rule, you can specify the daily operating times for alerts and a date on which the alert rule expires. You can also specify the aggregation interval for the alerts generated by the rule. The alert aggregation interval set for the alert is not affected by the aggregation interval set for the service. For more information about aggregation intervals, see Aggregation Intervals.
For a service for which monitoring is enabled, an alert rule is evaluated at discrete intervals. Once an alert rule is created, it is first evaluated at the end of the aggregation interval, and after that at the end of each sample interval. For example, if the aggregation interval of an alert rule is five minutes, it is evaluated five minutes after it is created, and then every minute after that (since the sample interval for five minutes is one minute).
If a rule evaluates to false no alert is generated. If the rule evaluates to true the alert generation is governed by the Alert Frequency. If the frequency is Every Time
, an alert is generated every time an alert rule evaluates to true. If the frequency is Notify Once
, an alert is generated only if no alert is generated in the previous evaluation. In other words, an alert is generated the first time the alert rule evaluates to true and no more notifications are generated until the condition resets itself and evaluates to True again.
4.2.1 SLA Alert Severity Levels
When you create an SLA or pipeline alert rule, you specify the severity of the alert. These levels have no concrete meaning within Service Bus; you define what they mean for your specific implementation. Alerts can have the following levels of severity:
-
Normal
-
Warning
-
Minor
-
Major
-
Critical
-
Fatal
4.2.2 Aggregation Intervals
The aggregation interval determines the frequency at which the monitoring system tests the alert condition. The condition is tested each time the monitoring subsystem aggregates enough samples of data to constitute one aggregation interval. For example, if you select an aggregation interval of 1 hour, the condition is tested each time an hour's worth of data is available. The first time the condition is tested is at the end of the first hour. After that, the condition is tested every 10 minutes because the sampling interval for an aggregation interval of 1 hour is set to 10 minutes.
You specify the aggregation interval for an alert rule when you create and configure the rule. This aggregation interval is not affected by the aggregation interval set for the service.
4.2.3 SLA Alert Frequencies
You can specify that an alert be generated every time the alert rule condition is met or only the first time it is met. When an alert rule generates an alert each time a condition is met, the actions included in the alert rule are executed every time the alert rule evaluates to true
. For example, if you define a condition that the average response time is greater than 300 milliseconds, you receive an alert every time this condition evaluates to true
.
The number of times an alert rule is evaluated depends on the aggregation interval and the sample interval associated with that rule. If the aggregation interval is set to 5 minutes, the sample interval is 1 minute. Rules are evaluated each time 5 samples of data are available. Therefore, the rule is evaluated for the first time approximately 5 minutes after it is created and every minute thereafter.
When an alert rule is configured to generate an alert only once, the actions included in the rule are executed the first time the rule evaluates to true
, and no more alerts are generated until the condition resets itself and evaluates to true
again. For example, if you define a condition that the average response time is less than 300 milliseconds, you receive an alert the first time this condition evaluates to true
, but you do not receive any more alerts until the condition evaluates to false
and then to true
again. The alert timestamp is updated and displayed on the Dashboard.
4.2.4 SLA Alert Statistics
When you define alert conditions for SLA alert rules, you can select from several measures to use to evaluate the alert, including a specific count, minimum, maximum, average, or status to evaluate. Depending on which of these you choose, the list of statistics you can select varies. For example, if you select Minimum, Maximum, or Average, the Response Time statistic is available. The statistics available also depend on the configuration of the service itself. The number of statistics varies according to whether a service has pipelines, route nodes, operations, and so on.
The following sections list and describe the available statistics for each measure.
4.2.4.1 Count Statistic Details
The following table describes Count Statistic details.
Table 4-1 Count Statistic Details
Statistic | Description |
---|---|
Cache Hit Count |
For business services that use result caching, this statistic increments each time the cache is used to return a response to a client. |
Error Count |
The number of errors. This number is incremented each time message processing returns a failure. |
Failover Count |
For business services only, the number of times failover occurs. |
Failure Ratio (%) |
The ratio of errors encountered to the total number of messages successfully processed over the specified aggregation interval. |
Message Count |
The total number of messages processed. |
Success Ratio (%) |
The ratio of messages successfully processed to the total number of messages encountered over the specified aggregation interval. |
|
For pipelines only, the number of erroneous messages processed by the request pipeline. |
|
For pipelines only, the number of messages processed by the request pipeline. |
|
For pipelines only, the number of erroneous messages processed by the response pipeline. |
|
For pipelines only, the number of messages processed by the response pipeline. |
Validation Error Count |
For proxy services that have a validate action in the pipeline, the number of validation errors. For pipelines, this statistic is named validation-errors. |
WSS Error Count |
This operand is available depending on the transport for the service (such as with HTTP). It is the number of Web Service Security (WSS) erroneous messages processed. This counter is only available for WSDL-based services and is updated when a WSS error is encountered. |
Uri: |
These operands set alerts for business process endpoint URIs. For information on how to generate alerts based on endpoint URIs, see Monitoring and Managing Endpoint URIs for Business Services. |
4.2.4.2 Maximum, Minimum, and Average Statistic Details
The following table describes Maximum, Minimum, and Average Statistic details.
Table 4-2 Maximum, Minimum, and Average Statistic Details
Statistic | Description |
---|---|
|
For pipelines only, the length of time it takes the request pipeline to process each message. |
|
For pipelines only, the length of time it takes the response pipeline to process each message. |
Elapsed Time |
For pipelines and split-joins only, the length of time it takes to process each request or response. |
Response Time |
The length of time in milliseconds it takes to process each request or response. |
Throttling Time |
For business services only, the length of time a message processed by a business service configured for throttling spent in the throttling queue. |
Uri: |
This operand sets alerts for business process endpoint URIs. For information on how to generate alerts based on endpoint URIs, see Monitoring and Managing Endpoint URIs for Business Services. |
4.2.4.3 Status Statistic Details
The status statistics only apply to business services, and you can use them to base your conditions on whether the business service's endpoint URI is online or offline.
The following table describes Status Statistic details.
Table 4-3 Status Statistic Details
Statistic | Description |
---|---|
All URIs Offline |
Evaluates to true if all URIs in the cluster are offline. |
All URIs Online |
Evaluates to true if all URIs in the cluster are online. |
Any URIs Offline |
Evaluates to true if any URIs in the cluster are offline. |
Any URIs Online |
Evaluates to true if any URIs in the cluster are online. |
4.3 About Pipeline Alerts
Pipeline alerts are triggered based on message context rather than a set of predefined conditions.
You define pipeline alerts directly in the pipeline message flow by adding and configuring an alert action. Pipeline alert actions generate alerts based on the message context in a pipeline, and can be configured to include an alert name, description (which can include message elements, such as $order
), alert destination, and alert severity. Unlike SLA alerts, notifications generated by a pipeline alert action are primarily intended for business purposes or to report errors, and not for monitoring system health.
To define conditions under which a pipeline alert is triggered, use the conditional constructs available in the pipeline editor such as XQuery Editor or an if-then-else construct. You have complete control over the alert body, including the context variables, and you can extract the portions of the message to include in the alert. In addition to viewing pipeline alerts in Fusion Middleware Control, you can also select an alert destination to send notifications through email or JMS destinations.
For more information, see "Adding Alert Actions" in Developing Services with Oracle Service Bus.
4.3.1 A Sample Use Case for Pipeline Alerts
A sample use for pipeline alert might be when you want to be notified when special business conditions are encountered in a message flow. You can configure an alert action in a pipeline to raise alerts when such predefined conditions are encountered. You can also configure email and JMS alert destinations to receive a notification of the alert, and send the details to the alert recipient in the form of payload.
For example, you want to be notified when an order exceeding $10 million is routed to a pipeline that routes orders to a purchase order website. You can create an alert action in the appropriate place in the pipeline that defines the condition of exceeding $10 million, and then configure an email alert destination as the target destination in the alert action. You can configure the content of the alert, and can also include the details of the order in the form of a payload.
Pipeline alerting can also be used to detect errors in a message flow. For example, when a proxy service validates the input documents, you may want to be notified when the validation fails so you can contact the client to fix the problem. For this you must configure an alert action within the error handler for the pipeline. In the action, you can include the actual error message in the fault variable and other details in the SOAP header, to be sent as the payload. You can also configure additional alert destinations using an alert destination resource in the alert action.
4.4 Enabling and Disabling Alerts
To raise an SLA or pipeline alert, you first define the alert rules and then enable alerting and monitoring at both the service level and the global level.
For example, to enable SLA alerts for a proxy service, you must define the alert rules for that service in Oracle Service Bus Console, enable SLA alerting and monitoring for that proxy service, and enable SLA alerting and monitoring globally for the service bus domain. The last two steps are performed in Fusion Middleware Control. The same steps apply to enabling pipeline alerts.
For more information about how to configure operational settings for services, see Viewing and Configuring Operational Settings. The Alert History panel contains a customizable table displaying information about violations or occurrences of events in the system.
4.5 Creating Service Level Agreement Alert Rules
Creating SLA alert rules is a two-step process. First, you configure properties for the alert rule, such as how and when the rule is evaluated, any email or JMS destinations for alerts generated from the rule, and the severity of the generated alerts. Once the properties are configured, you can specify the conditions that, when met, generate the alerts.
Note:
When a service is created from another service, alert rules are maintained in the following way:
-
When a proxy service is created from a business service or a business service is created from a proxy service, the alert rules, if any, are removed.
-
When a proxy service is created from another proxy service or a business service is created from another business service, the alert rules, if any, are retained.
4.5.1 Before You Begin
If you want the alerts generated by the SLA alert rule to be sent to email addresses or JMS queues for notifications, you must create an alert destination that defines those destinations. For more information, see "Working with Alert Destinations" in Developing Services with Oracle Service Bus.
4.5.3 Defining SLA Alert Rule Conditions
You must define at least one condition, which consists of a simple expression. If you specify multiple conditions, use the And/Or operators to combine them. For more information about the measures and statistics you can use to define conditions, see SLA Alert Statistics.
Figure 4-2 Create SLA Alert Rule Wizard - Rule Condition
Description of "Figure 4-2 Create SLA Alert Rule Wizard - Rule Condition"
These instructions assume you completed Configuring SLA Alert Rule Properties and are on the Rule Condition page of the Create SLA Alert Rule wizard.
To define the conditions for an SLA alert rule:
-
On the Rule Condition page of the Create SLA Alert Rule wizard, select the time period for the Condition Aggregation Interval.
For more information, see Aggregation Intervals.
-
To define a condition, do the following:
-
If there is no template row in the table, click Add a New Condition above the Condition Builder table.
A new row appears in the table.
-
From the first list of options, select the type of measure to use to evaluate the statistic.
Select from Count, Minimum, Maximum, Average, or Status.
Note:
Status is only available for business services, and lets you base a condition on whether the endpoint URI is online or offline.
-
From the second list of options, select the statistic to evaluate, such as Error Count, Failover Ratio (%), Response Time, and so on.
The available statistics vary based on the type of measure you selected. For more information, see SLA Alert Statistics.
-
From the third list of options, select a comparison operator: =, !=,
>
or <. -
In the field after the comparison operator, enter the value to compare the actual statistic against. If the condition is based on endpoint URI status, select True or False.
-
If the condition is based on endpoint URI status, select whether to evaluate the rule for all servers or on any server.
-
To the right of the row, click Update the Condition.
-
-
Repeat the above steps until you have added all the conditions you want to include in the alert rule.
-
To join the conditions you defined, select the conditions to join, click Join Selected Conditions, and then select either the And or Or operator.
The conditions you selected are combined into one complex expression.
Note:
You can join several conditions at once if they should be at the same level and should be joined by the same operator. If you join a group of conditions, and then join the resulting complex expression with yet another condition, that complex expression is nested within parentheses in the final complex expression.
-
To revert a complex expression back to its original separate conditions, select the complex expression and click Split Selected Condition.
-
When you are done configuring properties and creating conditions, click Create.
The new alert rule appears in the summary table, as shown in the following figure.
Figure 4-3 SLA Alert Rules Tab with Alert Rules Defined
Description of "Figure 4-3 SLA Alert Rules Tab with Alert Rules Defined"
4.6 Updating SLA Alert Rules
Once you create an SLA alert for a service, you can modify the rule properties and conditions. You can also delete an SLA alert.
4.7 Monitoring SLA and Pipeline Alerts
When monitoring SLA and pipeline alerts in Fusion Middleware Control, you can view statistics for all alerts in the domains or a subset of alerts, view detailed information about a specific alert, and delete or purge alerts. You can also view how the alert rule is configured for a specific alert.
4.7.1 Enabling Alert Reporting
In order to monitor SLA alerts, both monitoring and SLA alerts must be enabled at the global level. For information about configuring global settings, see Configuring Operational Settings at the Global Level.
Once the global settings are configured, you also need to enable monitoring and SLA alerts for each service for which you want to monitor SLA alerts. You can also specify the alert level . For instructions, see the following topics:
4.7.3 Filtering SLA and Pipeline Alerts
The Service Bus Alert History page lets you search for the specific alerts you want to view. Alerts are stored using the WebLogic Diagnostics Framework, which provides its own query language, including wildcard characters. For filtering alerts in the alert history, use the syntax described in "WLDF Query Language" in Configuring and Using the Diagnostics Framework for Oracle WebLogic Server.
From the Dashboard page, you can also click on a specific area in the SLA or pipeline alerts pie chart to display the Alert History page for alerts with the chosen level of severity and alert history duration.
Note:
This page also appears when you select an alert from the Dashboard page.
To perform a search for alerts from the Service Bus home page:
4.7.4 Viewing SLA or Pipeline Alert Details
When an alert appears in Alert History table of the Service Bus Dashboard or Alert History page, you can click the name of the alert to view more information about it.
4.7.4.1 Viewing Alert Details on the Service Bus Dashboard
You can view alert details on the Service Bus Dashboard.
To view alert details on the Dashboard:
4.7.5 Viewing the Alert Rule Configuration
You can view the configuration of the actual rule that triggered an alert. The Alert Rule dialog displays the following configuration information for a rule:
-
The name of the alert rule (for SLA alerts only)
-
The name and description of the rule
-
The expiration time of the rule
-
Whether the rule is enabled or disabled
-
The alert severity
-
The alert frequency
-
Whether processing for the rule is stopped after generating an alert
-
The aggregation interval
-
The condition expression
4.7.5.1 Viewing the Alert Rule Configuration on the Service Bus Dashboard
You can view the alert rule configuration on the Service Bus Dashboard.
To view the alert rule configuration on the Dashboard:
4.7.6 Deleting an SLA or Pipeline Alert
Once you display alerts on the Service Bus Alert History page, you can delete those alerts individually.
To delete an alert: