Skip navigation.

User Guide

  Previous Next vertical dots separating previous/next from contents/index/pdf Contents View as PDF   Get Adobe Reader

Monitoring

BEA AquaLogic Service Bus provides the capability to monitor and collect run-time information for systems operations purposes. AquaLogic Service Bus aggregates run-time statistics that you can view on a customizable Dashboard. The Dashboard allows you to monitor the health of the system and alerts you to problems in your messaging services. With this information, you can quickly and easily isolate and diagnose problems as they occur.

This chapter includes the following topics:

 


Monitoring Scenarios

The following describes some of the ways in which you can use AquaLogic Service Bus to check system operations and monitor messages.

Operational Health

The Dashboard page in the AquaLogic Service Bus Console provides the ability to immediately view the state of all servers and monitored services. The Dashboard displays two pie charts, a table, and several links. The Service Summary pie chart shows the percentage of alerts according to their severity for all services that have alert rules defined and monitoring enabled for the last 30 minutes. The Server Summary pie chart shows the current status of every server in the AquaLogic Service Bus domain. Additionally, from the Server Summary panel, you can drill-down and view the domain logs, which are grouped according to severity.

In addition to the pie charts, these Summaries include a list of the most active services and critical servers. The list displays up to ten services in descending order of the most number of alerts. The most critical server list displays the ten most critical servers. This display is based on the health state of the running servers, as defined by the WebLogic Diagnostic Service. For more information about the WebLogic Diagnostic Service, see Configuring and Using the WebLogic Diagnostics Framework.

From each of the summaries, you can drill-down into more detail by clicking a specific area on a pie chart or by clicking one of the links on the page.

The default Alert Summary table shows the severity of the alert, when the alert occurred, the name of the corresponding service, and what alert rule was violated. Alerts are displayed by severity. You can customize, search, and scroll through this table.

Alert Monitoring

When you log into the AquaLogic Service Bus Console, you see a list of alerts on the Dashboard. Each row of the table displays the information that you have configured, such as the severity, timestamp, and associated service. You notice that numerous alerts have been generated since your last viewing. To find the problem, you filter the alerts and discover that the Service Level Agreement (SLA) violation is due to errors produced by the Post-Trade Processing proxy service. SLAs are agreements that define the precise level of service expected by AquaLogic Service Bus business and proxy services.

Alternatively, attention to the problem involves an alert rule's ability to send messages in the event of a SLA violation. In this case, you are notified by email of the alert rule violation. After receiving the emails, you look into the problem and discover that the errors are produced by the Post-Trade Processing proxy service.

To narrow the problem down, you can use the reporting module. This scenario is continued in Message Tracking.

Statistics Monitoring

Suppose that you want to see how many messages in a particular service have processed successfully and how many have failed. To access this information, from the Dashboard, you access the Service Monitoring Summary page and filter the display for the relevant service. Besides displaying the number of messages that have successfully processed or failed, you can also see which project the service belongs to, the average execution time of message processing, and the number of alerts associated with the service. You can display monitoring statistics for the period of the current aggregration interval or you can display monitoring statistics for the period since you last reset statistics for this service or since you last reset statistics for all services.

Note: You use the Global Settings page in the System Administration module of the AquaLogic Service Bus Console to reset statistics. When you do this, make sure you are not in a WebLogic session on the WebLogic Server Administration Console.

Clicking the name of the service brings you to that service's Service Monitoring Details page. This page provides additional information such as the minimum and maximum response times and the overall average time it takes for the service to execute a message, the success-failure ratio, the number of messages that have failed because of security or validation errors, and the number of messages associated with proxy service components (pipelines and route nodes). You can display this information for specific operations associated with the service. Again, you can display these statistics for the period of the current aggregration interval or you can display the statistics for the period since you last reset statistics for this service or since you last reset statistics for all services.

Verifying Service Level Agreements

You are notified by email of a large number of execution-time SLA violations from the Trade Execution proxy service. To track down this problem, you log into the AquaLogic Service Bus Console. From the Dashboard, you drill into the service associated with the alerts and see that a pipeline operation that invokes an Avitek Web Service is unacceptably slow. After successfully renegotiating service-level characteristics with Avitek, you configure alert metrics to track Avitek's compliance with the agreement. Your company uses these results as the basis of ongoing discussions with Avitek regarding their performance.

 


About Monitoring

This section contains information on the following topics:

Aggregation Interval

In AquaLogic Service Bus, the monitoring subsystem collects statistical information, such as message-count and execution time, over an aggregation interval. The aggregation interval is the time period over which data points for a statistic are collected and then displayed in the AquaLogic Service Bus Console.

To illustrate how the aggregation interval works, suppose that you have configured a Purchasing Order proxy service that has monitoring enabled with an aggregation interval of 10 minutes. When a user sends the first message through the proxy service, monitoring is started. During the first ten minutes, the Service Summary page displays the partially computed data. At this time the system does not have 10 minutes of data. After the first 10 minutes of data aggregation, the system always displays the last 10 minutes of data. For example, at the 14th minute, the Dashboard displays minutes 4 through 14. If no messages are processed after the 15th minute, on the 25th minute, the Service displays zero messages. For more information about how aggregation interval affects the display of monitored information, see Alert Rules.

You must explicitly enable monitoring for any business or proxy service that you create; monitoring is disabled by default. After you have enabled monitoring and set the aggregation interval for your individual services, you can enable or disable monitoring for all those services from the Global Settings page in the System Administration module. For more information, see Monitoring Services.

Alerts are automated responses to Service Level Agreements (SLAs) violations or occurrences, which are displayed on the Dashboard. You define alert rules to specify unacceptable service performance according to your business and performance requirements. Each alert rule allows you to specify the aggregation interval for that rule when configuring the alert rule. This aggregation interval is not affected by the aggregation interval set for the service. Alert rules also allow you to send an email notification or post a message to a JMS queue or topic about the violation.

Monitoring Architecture

The following diagram shows the architecture of AquaLogic Service Bus monitoring.

Figure 5-1 Monitoring Architecture

Monitoring Architecture


 

The Statistics Configuration Manager stores and manages the statistics configuration for each operational resource. An operational resource is defined as the unit for which statistical information can be collected by the monitoring subsystem. An operational resource includes a proxy service, service operations, and pipelines. The Statistics Configuration Manager is notified about changes in the service definition, such as adding, updating, or deleting a pipeline.

Each managed server in a cluster hosts a Statistics Collector. The Statistics Collector collects statistics on operational resources as directed by the Statistics Configuration Manager. The collector also keeps samples history within the aggregation interval for the collected statistics. At every system-defined checkpoint interval, the collector stores the snapshot of current statistics into a persistent store for recovery purposes and sends the information to the Aggregator.

One of the managed servers in a cluster, called the Aggregating Server or Aggregator, is designated as the aggregator for cluster-wide statistics. At system-defined checkpoint intervals, each managed server in the cluster sends a checkpoint snapshot of its contributions to the Aggregator. The Aggregator then combines this information to offer cluster-wide statistics to its clients through Retriever APIs. The clients of Aggregator are the Dashboard, SLA Manager, and Service Monitoring modules.

To contribute a data point to the system, an operational resource in the system, such as a proxy service pipeline run time, calls a method on the Statistics Collector, and identifies itself, the statistic, and the data point.

The Dashboard shows the overall health related information of AquaLogic Service Bus. It provides an overview of the state of the system organized by server, services, and alerts.

After monitoring is enabled, the Service Monitoring Summary page in the AquaLogic Service Bus Console provides a view of the statistics collected for each service. It also provides information about the alerts generated due to SLA violations.

As previously mentioned, an SLA is an agreement that defines the precise level of service expected from business and proxy services in AquaLogic Service Bus. The SLA Manager, with the help of the AquaLogic Service Configuration module, allows users to configure SLA rule conditions and actions. The SLA Manager monitors SLA violations with the help of data provided by the Aggregator and sends notifications as configured in the alert rule actions. The SLA Manager is always deployed with the Aggregator and resides on only one managed server in cluster. The SLA Manager gives alerts to the Alert Log to store in the Alert Store.

Monitoring Services

When you create a business or proxy service, monitoring is disabled by default for that service. Enable monitoring as follows:

When creating alert rules, you must enable monitoring before you create the rule. For more information, see Alert Rules and "Create an Alert Rule" in Monitoring in the Using the AquaLogic Service Bus Console.

Refresh Rate of Monitored Information

At run time, the default refresh rate for the Dashboard page is one minute. However, it may take up to three minutes for the information to be displayed on the Dashboard. This delay happens because of the time gaps between when the messages are processed by the proxy service, when the metrics are collected, and the refresh rate of the Dashboard. The system works as follows:

  1. Every minute the data collector sends the current snapshot to the aggregator.
  2. Every 60 seconds, the aggregator merges all the documents it has received from the managed servers within the last minute.
  3. The AquaLogic Service Bus Console refreshes every minute; that is, it runs a query on the aggregated document and then displays the results.
  4. Figure 5-2 Aggregation Time Line

    Aggregation Time Line


     

For example, a proxy service starts sending data in T1, as shown in Figure 5-2. At T2—that is, the second minute—the collector sends the data to the aggregator. However, if an aggregation cycle has just occurred, the aggregator does not merge this data until the next aggregation cycle, which occurs after one minute, or a maximum of two minutes from the previous aggregation cycle. When the data is merged, it is now available for the AquaLogic Service Bus Console. Since the console refreshes every minute, if the refresh cycle has just passed, then the data is not displayed on the console until the third minute. Therefore, three minutes is the maximum delay.

You change the Dashboard polling interval in the System Administration module in the AquaLogic Service Bus Console. For information on how to do this, see "Setting the Dashboard Polling Interval Refresh Rate" in System Administration in the Using the AquaLogic Service Bus Console.

Dashboard

When you log onto the AquaLogic Service Bus Console, the Dashboard is automatically displayed. The Dashboard shows the monitoring information for the last 30 minutes. It provides an overview of the state of the system organized by server, services, and alerts, as shown in the following figure.

Figure 5-3 AquaLogic Service Bus Dashboard

AquaLogic Service Bus Dashboard


 

As shown in the previous figure the Dashboard displays the following information:

From the Dashboard, you can drill-down into the system and easily find specific information, such as the average execution time of a service, the date and time an alert occurred, or length of time a server has been running.

You configure the Dashboard and monitoring in the AquaLogic Service Bus Console, which is described in the Monitoring and System Administration sections of the Using the AquaLogic Service Bus Console.

 


Service Summary

This section contains information on the following topics:

About the Service Summary

The Service Summary panel provides an overview of the state of the services. The Service Summary pie chart shows the percentage of alerts according to their severity for all services that have alerts defined and monitoring enabled for the last 30 minutes. The severity level of alerts is user configurable and has no absolute meaning. Severity types include Fatal, Critical, Major, Minor, Warning, and Normal.The services having the highest severity alerts are listed beneath the pie chart, as shown in the following figure. Up to ten services can be listed in descending order of the sevice with the most alerts.

Figure 5-4 Services Summary Pane

Services Summary Pane


 

From the Service Summary panel, you can access more information about alerts by clicking the following:

Each of these pages is fully described in the sections that follow.

Warning: When a service (or its component; for example, a pipeline node) is renamed or relocated, its statistical data is lost.

For information on how to access detailed alert information, see "Viewing the Dashboard Statistics" in Monitoring in the Using the AquaLogic Service Bus Console.

Service Monitoring Summary

The Service Monitoring Summary page provides two views of service monitoring statistics, as shown in the following figures.

The first view is a moving statistic of the data collected by each service. This view is available when you select Current Aggregation Interval in the Show Metrics For field. The aggregation interval shown in the Aggregation Interval column determines the statistics that are displayed. For example, if the aggregation interval of a particular service is 20 minutes, that service's row displays the data collected in the last 20 minutes.

Figure 5-5 Service Monitoring Summary Page—Current Aggregation Interval

Service Monitoring Summary Page—Current Aggregation Interval


 

The second view is a running count of the metrics. This view is available when you select Since Last Reset in the Show Metrics For field. The statistics displayed in each row are for the period since you last reset statistics for an individual service or since you last reset statistics for all services on the Global Settings page in the System Administration module.

Figure 5-6 Service Monitoring Summary Page—Since Last Reset

Service Monitoring Summary Page—Since Last Reset


 

As shown in the top section of the preceding figures, you can filter the display of information using the following criteria:

The Service Monitoring Summary table displays the following information:

Note: An Action column is displayed when you have selected Since Last Reset in the Show Metrics For field. In this column, you can click the Reset Statistics icon for a specific service to reset the statistics for that service. When you confirm you want to do this, the system deletes all monitoring statistics that were collected for the service since the last time you clicked the Reset Statistics icon or the last time you clicked Reset Statistics on the Global Settings page. However, the system does not delete the statistics being collected during the Current Aggregation Interval for the service. Additionally, after you click the Reset Statistics icon, the system immediately starts collecting monitoring statistics for the service again.

Service Monitoring Details

The Service Monitoring Details page provides you with two views of detailed information about a specific service, as shown in the following figures.

The first view is a moving statistic of the data collected by the service. This view is available when you select Current Aggregation Interval in the Show Metrics For field. The aggregation interval shown in the Aggregation Interval column determines the statistics that are displayed. For example, if the aggregation interval of this service is 20 minutes, the view displays the data collected in the last 20 minutes.

Figure 5-7 Service Monitoring Details Page—Current Aggregation Interval

Service Monitoring Details Page—Current Aggregation Interval


 

The second view is a running count of the metrics. This view is available when you select Since Last Reset in the Show Metrics For field. The statistics displayed are for the period since you last reset statistics for this particular service or since you last reset statistics for all services on the Global Settings page in the System Administration module.

Figure 5-8 Service Monitoring Details Page—Since Last Reset

Service Monitoring Details Page—Since Last Reset


 

The displayed details have the following definitions:

 


Server Summary

This section contains information on the following topics:

About the Server Summary

The Server Summary panel provides an overview of the state of the servers. The pie chart shows the status of each server in the domain. The status for each server is derived from the WebLogic Diagnostic Service (see Configuring and Using the WebLogic Diagnostics Framework.). The ten most critical servers are displayed, as shown in Figure 5-9.

Figure 5-9 Server Summary Pane

Server Summary Pane


 

The displayed statuses have the following meanings:

Log Summary

The AquaLogic Service Bus Console allows you to view the WebLogic Server domain log. The domain log file provides a central location from which to view the overall status of the domain. Each server instance forwards a subset of its messages to a domain-wide log file. By default, servers forward only messages of severity level NOTICE or higher. You can modify the set of messages that are forwarded. For more information, see Understanding WebLogic Logging Services in Configuring Log Files and Filtering Log Messages.

If you configure the logging action in a pipeline, the log is forwarded to the server log. Unless you configure WebLogic Server to forward these messages to the domain log, you cannot view this log from AquaLogic Service Bus Console. For information in how to do this, see Create Log Filters in the WebLogic Server Administration Console Online Help.

To see the number of messages currently raised by the system, click the View Log Summary link in the Server Summary panel. A table is displayed that contains the number of messages grouped by severity, as shown in the following figure.

Figure 5-10 Log Summary

Log Summary


 

The displayed message statuses have the following meanings:

This display is based on the health state of the running servers, as defined by the WebLogic Diagnostic Service. For more information about the WebLogic Diagnostic Service, see Configuring and Using the WebLogic Diagnostics Framework.

To view the domain log for a particular type of message, click the number corresponding with the type of message. The following figure shows an example of a domain log file displayed in the AquaLogic Service Bus Console.

Figure 5-11 Domain Log File Entries

Domain Log File Entries


 

The following information is displayed:

For more information, see "Message Attributes" in Understanding WebLogic Logging Services in Configuring Log Files and Filtering Log Messages.

To display details of a single log file on the page, select the radio button for the appropriate log, then click the View button.

Server Summary

The Server Summary page provides a customizable table of servers, as shown in the following figure.

Figure 5-12 Server Summary Page

Server Summary Page


 

As shown in the top section of the preceding figure, the Server Summary Page displays the number of messages currently raised by the system. For information about the meaning of each type of status message, see Log Summary.

The server table displays the following information:

To view this information in the table as a pie or bar chart, click View as a Graph.

To filter the display of servers, click Customize Table above the server table. The available filtering is shown in the following figure.

Figure 5-13 Server Summary Table Filter

Server Summary Table Filter


 

For information about how to use the Server Summary Table Filter, see "Customize Your View of the Server Summary" in Monitoring in the Using the AquaLogic Service Bus Console.

Server Details

You can access the View Server Details page by clicking the name of a server under Most Critical Servers or by clicking the name of a server in the Servers Summary page.

The View Server Details page enables you to view more server monitoring details, as shown in the following figure.

Figure 5-14 Server Details Page—General Tab

Server Details Page—General Tab


 

The information displayed on this page is a subset of the Monitoring tab in the AquaLogic Service Bus Console Server Settings page. The details available are:

For more information, see the WebLogic Server Administration Console Online Help.

 


Alert Summary

This section contains information on the following topics:

About the Alert Summary

The Alert Summary panel contains a customizable table displaying information about violations or occurrences of events in the system. These violations and occurrences are based on SLAs. AquaLogic Service Bus provides various SLA monitors that you can configure to monitor proxy and business services. Some examples of SLA monitors are maximum execution time and authorization failure. You configure these monitors by creating alert rules. When a rule evaluates to true, it raises an alert. Additionally, you configure an alert rule to send an email or post a message on a JMS queue or topic.

Note: When you configure an alert rule to post a message to a JMS destination, you must create a JMS connection factory and a queue or topic, and target them to the appropriate JMS server in the WebLogic Server Administration Console. For information on how to do this, see "Configuring a JMS Connection Factory" and "JMS Resource Naming Rules for Domain Interoperability" in Configuring JMS System Resources in Configuring and Managing WebLogic JMS.

The AquaLogic Service Bus Console provides several ways to view and find alerts, such as by severity and by service. You can also view alerts graphically. For information on how to do this, see "Listing and Locating Alerts" and "Viewing a Chart of Alerts" in Monitoring in the Using the AquaLogic Service Bus Console.

The following figure shows the Alert Summary panel:

Figure 5-15 Alert Summary Panel

Alert Summary Panel


 

The Alert Summary panel shows alerts for the last 30 minutes. It contains the following types of information:

To view a complete list of alerts, click View Alert Summary List. See System Alerts History.

To customize the information displayed in the Alert Summary Panel, click Customize table above the summary table. The available filtering is shown in the following figure.

Figure 5-16 Alert Summary Table Filter

Alert Summary Table Filter


 

System Alerts History

To access the Customized System Alerts History page, in the Alert Summary panel, click View Alert Summary List. The Customized System Alerts History page enables you to view all the alerts by paging through the table (Figure 5-17) or by filtering the display of the alerts (Figure 5-18).

Figure 5-17 Customized System Alerts History

Customized System Alerts History


 

The table shown in the preceding figure is customizable and provides the following information:

To view a pie or bar chart of the alerts, click View Graph in the table.

To search for a specific alert, you can filter the display of alerts by clicking Customize Table in the Customized System Alerts History table. The available filtering is shown in the following figure.

Figure 5-18 System Alerts Table Filter

System Alerts Table Filter


 

For information about how to use the Alerts Table Filter, see "Customizing Your View of Alerts" in Monitoring in the Using the AquaLogic Service Bus Console.

Note: When an alert is fired in your configuration, a message is sent to your domain log, which resides at the following location:

[BEA_home\servers\<server_name>\logs\<domain_name>.log

Where domain_name represents the name you assigned your AquaLogic Service Bus domain when you created it.

The message is logged as an alert and has this message ID: BEA-394015

The message body is a string that consists of the following elements:

System Alert Details

The System Alert Details page displays complete information about the alert and allows you to add an annotation to the alert, as shown in the following figure.

Figure 5-19 Rule Details Page

Rule Details Page


 

The following information is displayed:

You access this page from the Dashboard by clicking Alert Severity in the Alert Summary table. This page also allows you to delete the alert.

View Alert Rule Details

The View Alert Rule Details page displays complete information about a specific alert rule, as shown in the following figure.

Figure 5-20 View Alert Rule Details Page

View Alert Rule Details Page


 

The following information is displayed:

For information about how to define alert rules, see "Create an Alert Rule" in Monitoring in the Using the AquaLogic Service Bus Console.

 


Alert Rules

This section includes information on the following topics:

About Alert Rules

As mentioned earlier, alerts are automated responses to SLAs violations or occurrences, which are displayed on the Dashboard. You define alert rules to specify unacceptable service performance according to your business and performance requirements. Each alert rule allows you to specify the aggregation interval for that rule when configuring the alert rule. The alert aggregation interval is not affected by the aggregation interval set for the service.

Rules are executed once every aggregation interval. On the Alert Rule page, if you set the Alert Frequency to Every Time, the rule's actions are executed every time the alert rule evaluates to true. If you set the Alert Frequency to Once Until Conditions Clear, the rule's actions are executed the first time the rule evaluates to true, and no more alerts are generated until the condition resets itself and evaluates to true again.

In the case where the Alert Frequency is set to Every Time, the number of times an alert rule is fired depends on the aggregation interval and the sample interval associated with that rule. For example, if the aggregation interval is set to 5 minutes, the sample interval is 1 minute. Rules are evaluated each time 5 samples of data are available. Therefore, the rule is evaluated for the first time approximately 5 minutes after it is created and every minute thereafter.

In the case where the Alert Frequency is set to Once Until Conditions Clear, after an alert is fired the first time in an aggregation interval, it is not fired again in the same aggregation interval.

Creating an alert rule involves three parts:

Note: Rules can only be created for services that are enabled for monitoring.

Detailed information about creating an alert rule is located in "Create an Alert Rule" in Monitoring in the Using the AquaLogic Service Bus Console.

Some Uses for Alerts

The following are some uses for alerts:

Understanding Alert Rules

The information in this section is presented in question-answer format.

Question 1: I created a service with an alert rule that has the following condition expression:

Aggregation Interval: 0 Hours(s) and 1 Minutes
Message Count = 0

It's been 10 minutes and I have not received any alerts.

Answer: Monitoring statistic collection for each statistical attribute, such as message count and error count, associated with a service begins when a change in the value of that statistic occurs. Data collection for the Message Count attributes begins when the first message is processed by the service and the Message Count attribute is incremented. Similarly, collection of data for the Error Count statistic starts only when the service encounters its first error and the Error Count attribute is incremented. If the service is idle, no monitoring information is collected for that service and subsequently no alert rules are triggered. After the first message is processed, monitoring data for that service is continually collected even if the service does not receive any further requests. Check to see if the service has received any requests.

Question 2: I defined a new alert rule with an aggregation interval that did not exist before and that rule does not seem to fire at all. All other rules created prior to this one are working correctly.

Answer: The cause is the same as in Question 1; the service needs to process at least one request after a rule with a new aggregation interval is created to trigger the alert rule. The other rules defined with different aggregation interval values are not affected by the alert rule.

Question 3: I restarted the server and none of my services have processed any requests. Why do I see alerts being generated?

Answer: Once the Monitoring subsystem has started collecting data for services, killing and restarting a server does not abort the collection process. The data collected is persisted and statistic collection picks up from where it left off.

Question 4: I have an alert rule with the following definition:

Aggregation Interval: 0 Hours(s) and 5 Minutes
Success Rate < 80%

The Service Monitoring Summary page shows the following values:

Message Count: 4

Error Count: 1

Why am I being alerted in this case? Shouldn't the success rate be 80% in this case?

Answer: No, the message count value displayed is the total of all messages processed by the service, including the ones that generated an error. Subsequently, in this case, the success rate is 75%.

Question 5: I created a service with an aggregation interval of 10 minutes that sends a JMS message. I could see the message on the Service Monitoring Summary page, but some time later the message count for my service shows as zero.

Answer: The Service Monitoring Summary page displays a moving statistic. In this case, it shows the message count in the last 10 minutes. Because no messages were processed by the system in the last 10 minutes, the message count is displayed as zero.

Question 6: I changed the aggregation interval of a service from 10 minutes to 5 minutes. The Service Monitoring Summary page shows all statistics as zero. One of the alerts in this server was configured to a statistical element with a 2 minute aggregation interval, which did not fire the next minute.

Answer: Changing the aggregation interval for a service removes the statistical information for all the services and alerts associated with that service. The alert initializes again and fires after the next aggregation interval expiry.

Question 7: I have a business service with multiple endpoints with an alert rule defined as Failover-count > 0. When one of the endpoints goes down, the alert is triggered. However, when a service has only one endpoint, the Failover-count is not incremented for this service. Instead, an error is generated.

Answer: Set the Retry count to a number greater than zero. For information about setting the Retry count, see "Adding a Business Service" in Business Services in the Using the AquaLogic Service Bus Console.

Question 8: I see that an alert is generated on the Dashboard but the value for the Alerts for last Aggregation Interval field on the Service Monitoring Details page displays zero.

Answer: Alert rules are evaluated after the completion of the interval, which happens after a checkpoint completion. If a rule evaluates to true, the rule's actions are triggered, a log is generated, and the interval-count statistic attribute (Alerts for Last Aggregation Interval) is incremented. The updated value of this counter is processed in the next checkpoint, 60 seconds later. The Monitoring Details page displays the updated count approximately one minute after the alert is generated.

Question 9: How does the active time for rules that span midnight work?

Answer: Consider the case where the active time for a rule is specified as 22:00 to 09:00.

On a given date, say June 7, the rule will be active and inactive as follows:

June 6, 10:00 P.M. to June 7, 9:00 A.M. - Active

June 7, 9:01 A.M. to June 7, 9:59 P.M. - Inactive

June 7, 10:00 P.M. to June 8, 9:00 A.M. - Active

The Collector sends ServerStatistics to the aggregator. The ServerStatistics represents the monitoring runtime data for that minute. In other words, it contains the statistics information for the services that have been enabled.

Every minute the aggregator aggregates the data received from the collector, and makes it available for the retriever sub system. The aggregator thread is skewed by 15 sec wrt to the collector checkpoint thread.

If you disable monitoring for the domain, you disable the statistics collection and the checkpointing process. The Collector no longer sends ServerStatistics to the aggregator server and the aggregator server does not have any aggregated data from the next minute, which means there is no data returned if you attempt to retrieve it. The same applies when you enable monitoring for the domain. The system initially does not show any data. However, after a maximum of two minutes, the aggregator has data and the Service Summary page displays this data.

As documented, disabling monitoring for the domain disables the statistics collection and the checkpointing process; that is, it no longer sends serverStatistics to the aggregation server and the aggregator server does not have any aggregated data from the next minute, which means when the user tries to retrieve the data it returns no configurations.

The same applies when one enables the domain monitoring , the system initially does not show any data and after a maximum of two minutes the aggregator would have data and service summary displays the same.


 

 

Skip navigation bar  Back to Top Previous Next