User Guide

     Previous  Next    Open TOC in new window    View as PDF - New Window  Get Adobe Reader - New Window
Content starts here

Monitoring

BEA AquaLogic Service Bus provides the capability to monitor and collect run-time information for systems operations purposes. AquaLogic Service Bus aggregates run-time statistics that you can view on a customizable Dashboard. The Dashboard allows you to monitor the health of the system and alerts you to problems in your messaging services. With this information, you can quickly and easily isolate and diagnose problems as they occur.

This chapter includes the following sections:

 


Monitoring Scenarios

The following sections describe some of the tools and functionality available in AquaLogic Service Bus Console to monitor messages and system operations. It includes:

Operational Health

The Dashboard page in the AquaLogic Service Bus Console provides the ability to view the state of all servers and monitored services immediately. The Dashboard displays two pie charts, a table, and several links. The Service Summary pie chart shows the percentage of alerts according to their severity for all services that were issued in the past 30 minutes. The Server Summary pie chart shows the current status of every server in the AquaLogic Service Bus domain. Additionally, from the Server Summary panel, you can drill down and view the domain logs, which are grouped according to severity.

In addition to the pie charts, these Summaries include a list of the most active services and critical servers. The list displays up to ten services, with fully qualified service names, in descending order of the most number of alerts. The most critical server list displays the ten most critical servers. This display is based on the health of the running servers, as defined by the WebLogic Diagnostic Service. For more information about the WebLogic Diagnostic Service, see Configuring and Using the WebLogic Diagnostics Framework.

From each of the summaries, you can drill down into more detail by clicking a specific area on a pie chart or by clicking one of the links on the page.

The Alert Summary table lists the alerts that were issued in the past 30 minutes. This table contains the following fields:

You can customize the layout of this table.

Monitoring Alerts

When you log into the AquaLogic Service Bus Console, you may see a list of alerts on the Dashboard. This display is dynamically refreshed. These alerts could be the result of SLA violations or pipeline alerts.Service Level Agreements(SLAs) are agreements that define the precise level of service expected from the AquaLogic Service Bus business and proxy services.

Each row of the table displays the information that you have configured, such as the severity, timestamp, and associated service. Clicking the severity link will display more details about the alert to help analyze the cause of the alert.

Console, e-mail, JMS, reporting or SNMP traps are the various alert destinations that can be configured for the alert. For example, you can choose e-mail or JMS as additional or replacement destination for alert notification.

Monitoring Statistics

Monitoring Statistics helps you know how many messages in a particular service have processed successfully and how many have failed. To access this information, from the Dashboard, you access the Service Monitoring Summary page and filter the display for the relevant service. Besides displaying the number of messages that have been processed successfully or failed, you can also see which project the service belongs to, the average execution time of message processing, and the number of alerts associated with the service. You can view monitoring statistics for the period of the current aggregation interval or for the period since you last reset statistics for this service or since you last reset statistics for all services.

You use the Global Settings page in the System Administration module of the AquaLogic Service Bus Console to reset statistics. When you do this, make sure you are not in a WebLogic session on the WebLogic Server Administration Console.

Clicking the name of the service brings you to that service's Service Monitoring Details page. This page provides additional information such as the minimum and maximum response times and the overall average time it takes for the service to execute a message, the success-failure ratio, the number of messages that have failed because of security or validation errors, and the number of messages associated with proxy service components (pipelines and route nodes). You can view this information for specific operations associated with the service. Again, you can view these statistics for the period of the current aggregation interval or you can display the statistics for the period since you last reset statistics for this service or since you last reset statistics for all services.

Verifying Service Level Agreements

Consider the following use case to verify the service level agreements:

Assume that a particular proxy service is generating a lot of SLA violation alerts due to slow response time. To investigate this problem further, you must log into the AquaLogic Service Bus Console and a take a look at the detailed statistics for the proxy service. At this level, you will be able to identify that, a third-party web service invocation stage in the pipeline is taking a lot of time and is the actual bottleneck. After successfully renegotiating service-level characteristics with the third-party web service provider, you could configure alert metrics to track the web service provider's compliance with the new agreement terms. Thus you can use alerts as the basis for negotiating Service Level Agreements.

Pipeline Alert Action

You can also generate alerts inside a stage in the pipeline using the Alert action. For this you use the Alert action in the Reporting category of the Actions menu.

You define conditions under which a pipeline alert is triggered using the conditional constructs available in the Pipeline Editor such as Xquery Editor or an if-then-else construct. You can use the Alert Destination resource in an alert action to define the destination for alert. You will have complete control over the alert body including the pipeline, and context variables. Also you will be able to extract the portions of the message.

You can obtain an integrated view of all the alerts generated by a service on the Dashboard page in the AquaLogic Service Bus Console.

Note: For more information on adding Alert action in a stage, see Proxy Services: Actions-Alert in Using the AquaLogic Service Bus Console

Alert Destination

You can view the alerts by using one of the following alert destinations:

AquaLogic Service Bus Console

The Dashboard shows the overall health related information of AquaLogic Service Bus. It provides an overview of the state of the system organized by server, services, and alerts.

After monitoring is enabled, the Service Monitoring Summary page in the AquaLogic Service Bus Console provides a view of the statistics collected for each service. It provides information about the alerts generated due to SLA violations or as a result of alert actions configured in the pipeline.

As previously mentioned, an SLA is an agreement that defines the precise level of service expected from business and proxy services in AquaLogic Service Bus. The SLA Manager, with the help of the AquaLogic Service Configuration module, allows users to configure SLA rule conditions and actions. The SLA Manager monitors SLA violations with the help of data provided by the Aggregator and sends notifications as configured in the alert rule actions. The SLA Manager is always deployed with the Aggregator and resides on only one managed server in cluster. The SLA Manager sends alerts to the Alert Log to store in the Alert Store.

E-mail Alert Destination

This is one of the destinations for the alerts.To configure this alert destination you have to first configure the SMTP global resource.This resource captures the address of the SMTP server corresponding to your e-mail destination, port number, and if required, the authentication credentials.The authentication credentials are stored inline and are not stored as a service account. The alert action makes use of the SMTP resource to send the outbound e-mail messages. You can also use the SMTP resource to send both pipeline alerts and SLA alerts. When an alert is delivered over an e-mail the metadata consisting of the details about the alert is prefixed to the payload configured.

Note: For more information on SMTP Server resource, see Overview of SMTP Servers in Using the AquaLogic Service Bus Console

SNMP Traps

The Simple Network Management Protocol (SNMP) traps allows any third party software to interface monitoring Service Level Agreements (SLAs) within AquaLogic Service Bus. By enabling the notification of alerts using SNMP, Web Services Management (WSM) and the Enterprise Service Management (ESM) tools can monitor SLA violations by monitoring alert notifications.

Simple Network Management Protocol (SNMP) is an application-layer protocol which allows the exchange of information on the management of a resource across a network. It enables you to monitor a resource and if required, rectify it based on the data obtained from the resource. Both the SNMP version 1 and SNMP version 2 are supported in this version of the AquaLogic Service Bus. SNMP is made up of the following components:

Managed Resource

This is the resource, which is being monitored. The resource and its attributes are added to the Management Information Base(MIB).

Management Information Base(MIB)

The Management Information Base (MIB) is a hierarchical data structure that stores all the resources to be monitored, in a hierarchical manner. It also stores the attributes of the resources, which are monitored. Each resource is given a unique identifier called the Object Identifier(OID).You can use the SNMP commands to retrieve the information on the management of a resource. The following section gives an illustration of the WebLogic Server MIB.

An Illustration of WebLogic Server MIB

The Weblogic Server installer creates a copy of the MIB in the following location:

<BEA_HOME>/weblogic92/server/lib/BEA-WEBLOGIC-MIB.asn1

where <BEA_HOME> is the directory in which you installed the WebLogic Server. WebLogic Server exposes thousands of data points in its management system. To organize this data it provides a hierarchical data model that reflects the collection of services and resources that are available in a domain. Figure 3-1 illustrates the hierarchy of objects in the MIB.

Figure 3-1 Hierarchy of Objects in MIB

Hierarchy of Objects in MIB

For example, if you created two managed servers, MS1 and MS2, in a domain, then MIB contains one object serverTable, which in turn contains one serverName object.The serverName object in turn contains two instances containing values MS1 and MS2. The MIB assigns a unique number called an object identifier (OID) to each managed object. Once assigned the you cannot change the OID. Each OID consists of a sequence of integers. This sequence defines the location of the object in the MIB tree. Each node in the path has both a number and a name associated with it.

For more information on WebLogic Server MIBs see WebLogic Server documentation at WebLogic ServerŪ 9.2 MIB Reference.

SNMP Agent

Each managed resource uses an SNMP agent to update the relevant information in the MIB. For this you should configure the SNMP agent to detect certain conditions within a managed resource and send trap notification (report) to the SNMP manager. You can configure the SNMP agent to generate traps in one of the following ways:

SNMP Manager

The SNMP manager manages the SNMP agents. SNMP is also it is the primary interface to the Network Management System.

Network Management System (NMS)

The Network Management System forms the interface with the user. It gathers data using the SNMP manager and presents it to the user.

JMS

Java Messaging Service (JMS) is another destination for a pipeline alert and a SLA alert. You will have use a JNDI URL for the JMS destination for alerts. When you configure an alert rule to post a message to a JMS destination, you must create a JMS connection factory and a queue or topic, and target them to the appropriate JMS server in the WebLogic Server Administration Console. For information on how to do this, see "Configuring a JMS Connection Factory" and "JMS Resource Naming Rules for Domain Interoperability" in Configuring JMS System Resources in Configuring and Managing WebLogic JMS.When you define the JMS alert destination you can either use a destination queue or a destination topic. The message type can be bytes or text. For more information on how to configure JMS alert destination see Alert Destinations in Using the AquaLogic Service Bus Console.

Reporting

This is another process to monitor and analyze both pipeline alerts and SLA alerts. This process of monitoring is discussed in detail in Reporting

 


About Monitoring

This section contains information on the following topics:

Aggregation Interval

In AquaLogic Service Bus, the monitoring subsystem collects statistical information, such as message-count , execution time, over an aggregation interval. The aggregation interval is the time period over which statistical data is collected and displayed in the AquaLogic Service Bus Console.

Following is an illustration of how the aggregation interval works:

Consider a proxy service you have configured for processing a purchase order, for which you have enabled with an aggregation interval of 10 minutes. When you send the first message through the proxy service, monitoring is started. Until the first ten minutes elapse, the Service Summary page displays the partially computed data. At this time the system does not have 10 minutes of data. After the first 10 minutes of data aggregation, the system always displays the last 10 minutes of data. For example, at the 14th minute, the Dashboard displays minutes 4 through 14. If no messages are processed after the 15th minute, on the 25th minute, the Service does not display any data. For more information about how aggregation interval affects the display of monitored information, see Alert Rules.

You must explicitly enable monitoring for any business or proxy service that you create; monitoring is disabled by default. After you have enabled monitoring and set the aggregation interval for your individual services, you can enable or disable monitoring for all those services from the Global Settings page in the System Administration module. For more information, see Monitoring Services.

SLA alerts are automated responses to Service Level Agreements (SLAs) violations or occurrences, which are displayed on the Dashboard. You define alert rules to specify unacceptable service performance according to your business and performance requirements. Each alert rule allows you to specify the aggregation interval for that rule when configuring the alert rule. This aggregation interval is not affected by the aggregation interval set for the service. Alert rules also allow you to send notifications to the configured alert destinations on topic about the violation. For information on defining alert rules, see Creating Alert Rules in Using the Using the AquaLogic Service Bus Console

Monitoring Architecture

The following diagram shows the architecture of AquaLogic Service Bus monitoring.

Figure 3-2 Monitoring Architecture

Monitoring Architecture

The Statistics Configuration Manager stores and manages the statistics configuration for each operational resource. An operational resource is defined as the unit for which statistical information can be collected by the monitoring subsystem. An operational resource includes a proxy service, service operations, and pipelines. The Statistics Configuration Manager is notified about changes in the service definition, such as adding, updating, or deleting a pipeline.

Each managed server in a cluster hosts a Statistics Collector. The Statistics Collector collects statistics on operational resources as directed by the Statistics Configuration Manager. The Statistics Collector also keeps samples history within the aggregation interval for the collected statistics. At every system-defined checkpoint interval, the Statistics Collector stores a snapshot of current statistics into a persistent store for recovery purposes and sends the information to the Statistics Aggregator.

One of the managed servers in a cluster, called the Aggregating Server or Aggregator, is designated as the aggregator for cluster-wide statistics. At system-defined checkpoint intervals, each managed server in the cluster sends a checkpoint snapshot of its contributions to the Aggregator. The Aggregator then combines this information to offer cluster-wide statistics to its clients through Retriever APIs. The clients of Aggregator are the Dashboard, SLA Manager, and Service Monitoring modules.

To contribute a data point to the system, an operational resource in the system, such as a run-time proxy service pipeline, calls a method on the Statistics Collector, and identifies itself, the statistic, and the data point.

Monitoring Services

When you create a business or proxy service, monitoring is disabled by default for that service. Enable monitoring as follows:

When creating alert rules, you must enable monitoring before you create the rule. For more information, see Alert Rules and "Create an Alert Rule" in Monitoring in the Using the AquaLogic Service Bus Console.

Refresh Rate of Monitored Information

At run time, the default refresh rate for the Dashboard page is one minute. However, it may take up to three minutes for the information to be displayed on the Dashboard. This delay occurs because of the time gaps between when the messages are processed by the proxy service, when the metrics are collected, and the refresh rate of the Dashboard. The system works as follows:

  1. Every minute the Statistics Collector sends the current snapshot to the aggregator.
  2. Every minute, the aggregator merges all the documents it has received from the managed servers within the last minute.
  3. The AquaLogic Service Bus Console refreshes every minute; that is, it runs a query on the aggregated document and then displays the results.
  4. Figure 3-3 Aggregation Time Line


    Aggregation Time Line

For example, a proxy service starts sending data in T1, as shown in Figure 3-3. At T2—that is, the second minute—the Statistics Collector sends the data to the aggregator. However, if an aggregation cycle has just occurred, the aggregator does not merge this data until the next aggregation cycle, which occurs after one minute, or a maximum of two minutes from the previous aggregation cycle. When the data is merged, it is now available for the AquaLogic Service Bus Console. Since the console refreshes every minute, if the refresh cycle has just passed, but the console displays the alerts after a maximum time of three minutes.

You can change the Dashboard polling interval in the System Administration module in the AquaLogic Service Bus Console. For information on how to do this, see "Setting the Dashboard Polling Interval Refresh Rate" in System Administration in the Using the AquaLogic Service Bus Console.

Dashboard

When you log onto the AquaLogic Service Bus Console, the Dashboard is automatically displayed. The Dashboard shows the monitoring information for the last 30 minutes. It provides an overview of the state of the system—organized by server, services, and alerts, as shown in the following figure.

Figure 3-4 AquaLogic Service Bus Dashboard

AquaLogic Service Bus Dashboard

As shown in the previous figure the Dashboard displays the following information:

From the Dashboard, you can drill-down into the system and easily find specific information, such as the average execution time of a service, the date and time an alert occurred, or the duration for which server has been running.

You configure the Dashboard and monitoring in the AquaLogic Service Bus Console, which is described in the Monitoring and System Administration sections of Using the AquaLogic Service Bus Console.

 


Service Summary

This section provides information on the following topics:

About the Service Summary

The Service Summary panel provides an overview of the state of the services. The Service Summary pie chart shows the percentage of alerts according to their severity for all services for which alerts are defined and monitoring is enabled for the last 30 minutes. The severity level of alerts is user configurable and has no absolute meaning. Severity types include Fatal, Critical, Major, Minor, Warning, and Normal. The services having the most number of alerts are listed beneath the pie chart, as shown in the following figure. Up to ten services are listed in descending order of services with the most alerts.

Figure 3-5 Services Summary Pane

Services Summary Pane

From the Service Summary panel, you can access more information about alerts by clicking the following:

Each of these pages is fully described in the sections that follow.

WARNING: When a service (or its component; for example, a pipeline node) is renamed or relocated, its statistical data is lost.

For information on how to access detailed alert information, see "Viewing the Dashboard Statistics" in Monitoring in the Using the AquaLogic Service Bus Console.

Service Monitoring Summary

The Service Monitoring Summary page provides two views of service monitoring statistics, as shown in the following figures.

The first is a dynamic view of statistical data collected by each service. This view is available when you select Current Aggregation Interval in the Show Metrics For field. The aggregation interval displayed in this view determines the statistics that are displayed. For example, if the aggregation interval of a particular service is 20 minutes, that service's row displays the data collected in the last 20 minutes.

Figure 3-6 Service Monitoring Summary Page—Current Aggregation Interval

Service Monitoring Summary Page—Current Aggregation Interval

The second view is a running count of the metrics. This view is available when you select Since Last Reset in the Show Metrics For field. The statistics displayed in each row are for the period since you last reset the statistics for an individual service or since you last reset the statistics for all services on the Global Settings page in the System Administration module.

Figure 3-7 Service Monitoring Summary Page—Since Last Reset

Service Monitoring Summary Page—Since Last Reset

As shown in the top section of the preceding figures, you can filter the display of information using the following criteria:

The Service Monitoring Summary table displays the following information:

Note: An Action column is displayed when you have selected Since Last Reset in the Show Metrics For field. In this column, you can click the Reset Statistics icon for a specific service to reset the statistics for that service. When you confirm that you want to do this, the system deletes all monitoring statistics that were collected for the service since the last time you clicked the Reset Statistics icon or the last time you clicked Reset Statistics on the Global Settings page. However, the system does not delete the statistics being collected during the Current Aggregation Interval for the service. Once you click the Reset Statistics icon, the system immediately starts collecting monitoring statistics for the service again.

Service Monitoring Details

The Service Monitoring Details page provides you with two views of detailed information about a specific service, as shown in the following figures.

The first is a dynamic view of the statistical data collected by the service. This view is available when you select Current Aggregation Interval in the Show Metrics For field. The aggregation interval displayed in this view determines the statistics that are displayed. For example, if the aggregation interval of this service is 20 minutes, the view displays the data collected in the last 20 minutes.

Figure 3-8 Service Monitoring Details Page—Current Aggregation Interval

Service Monitoring Details Page—Current Aggregation Interval

The second view is a running count of the metrics. This view is available when you select Since Last Reset in the Show Metrics For field. The statistics displayed are for the period since you last reset statistics for this particular service or since you last reset statistics for all services on the Global Settings page in the System Administration module.

Figure 3-9 Service Monitoring Details Page—Since Last Reset

Service Monitoring Details Page—Since Last Reset

The Service Monitoring Details Page displays the following set of information:

 


Server Summary

This section provides information on the following topics:

About the Server Summary

The Server Summary panel provides an overview of the state of the servers. The pie chart shows the status of each server in the domain. The status for each server is derived from the WebLogic Diagnostic Service (see Configuring and Using the WebLogic Diagnostics Framework.). The five most critical servers are displayed, as shown in Figure 3-10.

Figure 3-10 Server Summary Pane

Server Summary Pane

The displayed statuses have the following meanings:

Log Summary

The AquaLogic Service Bus Console allows you to view the WebLogic Server domain log. The domain log file provides a central location from which to view the overall status of the domain. Each server instance forwards a subset of its messages to a domain-wide log file. By default, servers forward only messages of severity level NOTICE or higher. You can modify the set of messages that are forwarded. For more information, see Understanding WebLogic Logging Services in Configuring Log Files and Filtering Log Messages.

If you configure the logging action in a pipeline, the log is forwarded to the server log. Unless you configure WebLogic Server to forward these messages to the domain log, you cannot view this log from AquaLogic Service Bus Console. For information in how to do this, see Create Log Filters in the WebLogic Server Administration Console Online Help.

To see the number of messages currently raised by the system, click the View Log Summary link in the Server Summary panel. A table is displayed that contains the number of messages grouped by severity, as shown in the following figure.

Figure 3-11 Log Summary

Log Summary

The displayed message statuses have the following meanings:

This display is based on the health state of the running servers, as defined by the WebLogic Diagnostic Service. For more information about the WebLogic Diagnostic Service, see Configuring and Using the WebLogic Diagnostics Framework.

To view the domain log for a particular type of message, click the number corresponding with the type of message. The following figure shows an example of a domain log file displayed in the AquaLogic Service Bus Console.

Figure 3-12 Domain Log File Entries

Domain Log File Entries

The following information is displayed:

For more information, see "Message Attributes" in Understanding WebLogic Logging Services in Configuring Log Files and Filtering Log Messages.

To display details of a single log file on the page, select the radio button for the appropriate log, then click the View button.

Server Summary

The Server Summary page provides a customizable table of servers, as shown in the following figure.

Figure 3-13 Server Summary Page

Server Summary Page

As shown in the upper section of the Figure 3-13, the Server Summary Page displays the number of messages currently raised by the system. For information about the meaning of each type of status message, see Log Summary.

The server table displays the following information:

To view this information in the table as a pie or bar chart, click View as a Graph.

To filter the display of servers, click Customize Table above the server table. The available filtering is shown in the following figure.

Figure 3-14 Server Summary Table Filter

Server Summary Table Filter

For information about how to use the Server Summary Table Filter, see "Customize Your View of the Server Summary" in Monitoring in the Using the AquaLogic Service Bus Console.

Server Details

You can access the View Server Details page by clicking the name of a server under Most Critical Servers or by clicking the name of a server in the Servers Summary page.

The View Server Details page enables you to view more server monitoring details, as shown in the following figure.

Figure 3-15 Server Details Page—General Tab

Server Details Page—General Tab

The information displayed on this page is a subset of the Monitoring tab in the AquaLogic Service Bus Console Server Settings page. The details available are:

For more information, see WebLogic Server Administration Console Online Help.

 


Alert Summary

This section provides information on the following topics:

About the Alert Summary

In AquaLogic Service Bus there are two types of alerts that can occur. They are:

Pipeline Alerts

The alerts triggered when alert actions, configured within a pipeline are executed, are called as the pipeline alerts. You can use actions grouped under the reporting category. The actions available under the Report category are:

For more information, see Proxy Service: Actions in Using the AquaLogic Service Bus ConsoleThe alerts are monitored using the alert destinations.

Service Level Agreement Alerts (SLA)

The Service Level Agreement (SLA) alerts are generated when the service violates the service level agreement or a predefined condition. The Alert Summary panel contains a customizable table displaying information about violations or occurrences of events in the system. These violations and occurrences are based on SLAs. AquaLogic Service Bus provides various SLA monitors that you can configure to monitor proxy and business services. Some examples of SLA monitors are maximum execution time and authorization failure. You configure these monitors by creating alert rules.When a rule evaluates to true, it raises an alert. This alert can be sent to console, SNMP trap, reporting stream, e-mail recipients or JMS queue/topic. These destinations for the alert are configured using the alert destination resource.

Note:

The AquaLogic Service Bus Console provides several ways to view and find alerts, such as by severity and by service. You can also view alerts graphically. For information on how to do this, see "Listing and Locating Alerts" and "Viewing a Chart of Alerts" in Monitoring in Using the AquaLogic Service Bus Console.

The following figure shows the View Alert Summary List:

Figure 3-16 View Alert Summary List

View Alert Summary List

The Alert Summary panel shows alerts for the last 30 minutes. It contains the following details:

To view a complete list of alerts, click View Alert Summary List. See System Alerts History.

To customize the information displayed in the Alert Summary Panel, click Customize table above the summary table. The available filtering is shown in the following figure.

Figure 3-17 Alert Summary Table Filter

Alert Summary Table Filter

To customize the sort order of the displayed alerts, click the sort icons beside the column headers.

System Alerts History

To access the Customized System Alerts History page, in the Alert Summary panel, click View Alert Summary List. The Customized System Alerts History page enables you to view all the alerts by paging through the table (see Figure 3-18) or by filtering the display of the alerts (see Figure 3-19).

Figure 3-18 Customized System Alerts History

Customized System Alerts History

You can customize the table shown in the Figure 3-18 and provides the following details:

To view a pie or bar chart of the alerts, click View Graph in the table.

To search for a specific alert, you can filter the display of alerts by clicking Customize Table in the Customized System Alerts History table. The filtering is shown options are available in the following figure.

Figure 3-19 System Alerts Table Filter

System Alerts Table Filter

For information about how to use the Alerts Table Filter, see "Customizing Your View of Alerts" in Monitoring in the Using the AquaLogic Service Bus Console.

Note: When an alert is raised in your configuration, a message is sent to your domain log, which resides at the following location:
Note: <BEA_HOME>\servers\server_name\logs\domain_name.log
Note: where
Note: The message is logged as an alert and has this message ID: BEA-394015
Note: The message body is a string that consists of the following details:

System Alert Details

The System Alert Details page displays complete information about the alert and allows you to add an annotation to the alert, as shown in the following figure.

Figure 3-20 System Alert Details Page

System Alert Details Page

The following details are displayed:

You access this page from the dashboard by clicking Alert Severity in the Alert Summary table. This page also allows you to delete the alert.

View Alert Rule Details

The View Alert Rule Details page displays complete information about a specific alert rule, as shown in the following figure.

Figure 3-21 View Alert Rule Details Page

View Alert Rule Details Page

The following information is displayed:

For information about how to define alert rules, see "Create an Alert Rule" in Monitoring in the Using the AquaLogic Service Bus Console.

 


Alert Rules

This section provides information on the following topics:

About Alert Rules

As mentioned earlier, alerts are automated responses to SLAs violations, which are displayed on the Dashboard. You define alert rules to specify unacceptable service performance according to your business and performance requirements. Each alert rule allows you to specify the aggregation interval for that rule when configuring the alert rule. The alert aggregation interval is not affected by the aggregation interval set for the service.

On the Alert Rule page, if you set the Alert Frequency to Every Time, the notifications are issued every time the alert rule evaluates to true. If you set the Alert Frequency to Once When Condition Is True the notifications are issued the first time the rule evaluates to true, and no more notifications are generated until the condition resets itself and evaluates to true again.

In the case where the Alert Frequency is set to Every Time, the number of times an alert rule is fired depends on the aggregation interval and the sample interval associated with that rule. For example, if the aggregation interval is set to 5 minutes, the sample interval is 1 minute. Rules are evaluated each time 5 samples of data are available. Therefore, the rule is evaluated for the first time approximately 5 minutes after it is created and every minute thereafter.

In the case where the Alert Frequency is set to Once When Condition is True, after an alert is fired the first time in an aggregation interval, it is not fired again in the same aggregation interval.

Creating an alert rule involves three parts:

Note: Rules can only be created for services that are enabled for monitoring.

For more information about creating an alert rule is located in "Create an Alert Rule" in Monitoring in the Using the AquaLogic Service Bus Console.

Some Uses for Alerts

The following are some uses for alerts:

Understanding Alert Rules

The information in this section is presented in question-answer format.

Question 1: I created a service with an alert rule that has the following condition expression:

       Aggregation Interval:0 Hours(s) and 1 Minutes
       Message Count = 0

It has been 10 minutes and I have not received any alerts.

Answer: Monitoring statistic collection for each statistical attribute, such as message count and error count, associated with a service begins when a change in the value of that statistic occurs. Data collection for the Message Count attributes begins when the first message is processed by the service and the Message Count attribute is incremented. Similarly, collection of data for the Error Count statistic starts only when the service encounters its first error and the Error Count attribute is incremented. If the service is idle, no monitoring information is collected for that service and subsequently no alert rules are triggered. After the first message is processed, monitoring data for that service is continually collected even if the service does not receive any further requests. Check to see if the service has received any requests.

Question 2: I defined a new alert rule with an aggregation interval that did not exist before and that rule does not seem to raise any alerts. All other rules created prior to this one are working correctly.

Answer: The cause is the same as for Question 1; the service needs to process at least one request after a rule with a new aggregation interval is created to trigger the alert rule. The other rules defined with different aggregation interval values are not affected by the alert rule.

Question 3: I restarted the server and none of my services have processed any requests. Why do I see alerts being generated?

Answer: Once the Monitoring subsystem has started collecting data for services, stopping and restarting a server does not abort the collection process. The data collected is persisted and statistic collection picks up from where it left off.

Question 4: I have an alert rule with the following definition:

       Aggregation Interval:0 Hours(s) and 5 Minutes
       Success Rate < 80%

The Service Monitoring Summary page shows the following values:

       Message Count: 4

       Error Count: 1

Why am I being alerted in this case? Shouldn't the success rate be 80% in this case?

Answer: No, the message count value displayed is the total of all messages processed by the service, including the ones that generated an error. Subsequently, in this case, the success rate is 75%.

Question 5: I created a service with an aggregation interval of 10 minutes that sends a JMS message. I could see the message on the Service Monitoring Summary page, but some time later the message count for my service shows as zero.

Answer: The Service Monitoring Summary page displays dynamic statistics. In this case, it shows the message count in the last 10 minutes. Because no messages were processed by the system in the last 10 minutes, the message count is displayed as zero.

Question 6: I changed the aggregation interval of a service from 10 minutes to 5 minutes. The Service Monitoring Summary page shows all statistics as zero. One of the alerts in this server was configured to a statistical element with a 2 minute aggregation interval, which did not fire the next minute.

Answer: Changing the aggregation interval for a service removes the statistical information for all the services and alerts associated with that service. The alert initializes again and triggers an alert at the end of aggregation interval expiry.

Question 7: I have a business service with multiple endpoints with an alert rule defined as Failover-count > 0. When one of the endpoints goes down, the alert is triggered. However, when a service has only one endpoint, the Failover-count is not incremented for this service. Instead, an error is generated.

Answer: Set the Retry count to a number greater than zero. For information about setting the Retry count, see "Adding a Business Service" in Business Services in the Using the AquaLogic Service Bus Console.

Question 8: I see that an alert is generated on the Dashboard but the value for the Alerts for Current Aggregation Interval field on the Service Monitoring Details page displays zero.

Answer: Alert rules are evaluated after the completion of the interval, which occurs after a checkpoint completion. If a rule evaluates to true, the rule's actions are triggered, a log is generated, and the interval-count statistic attribute (Alerts for Current Aggregation Interval) is incremented. The updated value of this counter is processed in the next checkpoint, 60 seconds later. The Monitoring Details page displays the updated count approximately one minute after the alert is generated.

Question 9: How does the active time for rules that span midnight work?

Answer: Consider the case where the active time for a rule is specified as 22:00 to 09:00.

On a given date, say June 7, the rule will be active and inactive as follows:

      June 6, 10:00 P.M. to June 7, 9:00 A.M. - Active

      June 7, 9:01 A.M. to June 7, 9:59 P.M. - Inactive

      June 7, 10:00 P.M. to June 8, 9:00 A.M. - Active

The ServerStatistics are sent to the dashboard. The ServerStatistics represents the monitoring runtime data for that minute. In other words, it contains the statistics information for the services that have been enabled.

The monitoring system aggregates the data received every minute makes it available for the retriever sub system. The aggregator thread is behind by 15 seconds with respect to the Statistics Collector checkpoint thread.

If you disable monitoring for the domain, you disable the collection of statistics for that domain. The monitoring data is no longer collected from the next minute, which means there is no data returned if you attempt to retrieve it. The same applies when you enable monitoring for the domain. The system initially does not show any data. However, after a maximum of two minutes, the Service Summary page displays the results of monitoring.

 


Statistics Associated With Different Resources

The following section provides more information on different statistics associated with:

SERVICE

A service has an inbound endpoint or an outbound endpoint that is registered with the Service Directory of the AquaLogic Service Bus. Such services are associated with other resources such as WSDL, and security settings. The statistics reported for this resource type is listed inTable 3-1. It also give you the type of the statistics.

Table 3-1 Statistics Reported for SERVICE
Statistic
Type
message-count
count
error-count
count
failover-count
count
response-time
interval
validation-errors
count
severity-warning
count
severity-major
count
severity-minor
count
severity-normal
count
severity-fatal
count
severity-critical
count
severity-all
count
failure-rate
count
wss-error
count
success-rate
count

FLOW_COMPONENT

Statistics are collected for two FLOW_COMPONENT types, namely, Pipeline-pair node and Route node. For more details on Pipeline-pair node and route node see Table 2-1 of Modeling Message Flow in AquaLogic Service Bus. The statistics reported for FLOW_COMPONENT are listed in Table 3-2

Table 3-2 Statistics Reported For FLOW_COMPONENT
Statistic
Type
elapsed-time
interval
message-count
count
error-count
count

WEBSERVICE_OPERATION

The statistics pertaining to the WEBSERVICE_OPERATION such as WSDLs are collected and stored in a runtime XML file. The statistics reported for this type of resource are listed in Table 3-3

Table 3-3 Statistics Reported for WEBSERVICE_OPERATION
Statistics
Type
elapsed-time
interval
message-count
count
error-count
count

 


Auditing

Auditing helps you to keep track of changes in the configuration of the AquaLogic Service Bus(ALSB). The three types of auditing you can perform are briefly described in:

Configuration Change Auditing

When you perform configurational changes in AquaLogic Service Bus console a track record of the changes is generated and history of all the configurational changes is maintained. Only the previous image of the object is maintained. You can view or access the history of configurational changes and the list of resources that have been changed during the session only through the console. However, in order to access all the information on configuration you have to activate the session.

Runtime Auditing of Messages

Auditing the entire message flow pipeline during is tedious. However, you can use the reporting action to perform selective auditing of the message flow pipeline during run time. You insert the reporting action at required points in the message flow pipeline and extract the required information. The extracted information may be then stored in a database or sent to the reporting stream in order to write the auditing report.

Security Auditing

When a message is sent to the proxy service and there is a breach in the transport level authentication or the security of the Web Services, WebLogic server generates an audit trail. You have to configure the WebLogic server to generate this audit trail. Using this you can audit all security violations that occur in the message flow pipeline. It also generates an audit trail whenever it authenticates a user. For more information on security auditing, see Configuring the WebLogic Security Framework: Main Steps in AquaLogic Service Bus Security Guide.


  Back to Top       Previous  Next