15 Monitoring and Managing Endpoint URIs for Business Services
This chapter describes how to manage endpoint URIs in business services, including configuring retries, marking non-responsive endpoints as offline, viewing endpoint metrics, and triggering alerts based on endpoint status.
This chapter contains the following sections:
About Endpoint URI Management
In the runtime, you can monitor metrics for each endpoint URI to ensure they are all performing as expected.
When you notice issues with an endpoint URI, you can mark the URI as being offline to avoid repeated attempts at accessing the endpoint URI. You can alternatively configure the business service to mark non-responsive URIs as offline.
About Endpoint URIs
An endpoint URI is the URL of an external service that is accessed by a business service. In Service Bus, you must define at least one endpoint URI for a business service. When you define multiple endpoint URIs for a business service, the load balancing algorithm you define controls the manner in which a business service tires to access the endpoint URI. A business service can use one of the following load balancing algorithms:
- 
                           Round robin 
- 
                           Random 
- 
                           Random-weighted 
- 
                           None 
When you configure a business service, you can also configure how retries are handled. For more information, see "About Business Service URI Retries" in Developing Services with Oracle Service Bus.
Offline and Online Endpoint URIs
You can configure a business service to mark non-responsive URIs offline, which prevents a business service from repeatedly attempting to access a non-responsive URI and therefore avoids the communication errors caused by trying to access a non-responsive URI. If Service Bus automatically marks an endpoint URI offline, Service Bus can bring it back online after a time period you specify, or Service Bus can keep it offline until you change the status manually. You can manually change the status of an endpoint URI to online or offline using Fusion Middleware Control or using the public APIs. When you mark an endpoint URI online in a cluster domain, it is marked online on all the Managed Servers.
Service Bus automatically marks an endpoint URI online when any of the following occur:
- 
                           You add the endpoint URI to a business service. 
- 
                           You restart a server. 
- 
                           You enable a disabled service. 
- 
                           You rename or move a service. 
- 
                           A business service is able to successfully access the URI after the retry interval you have configured is past. 
When you configure a business service to mark non-responsive URIs offline automatically, you can make this state temporary or permanent (or until you manually update the status).
For more information, see Configuring Operational and Global Settings.
About Temporarily Offline Endpoint URIs
Mark an endpoint URI offline temporarily if you want the business service to automatically retry the same endpoint after a short interval of time; mark it offline permanently if you want the business service to treat the endpoint URI as offline until it is reset manually.
When marked offline temporarily, the endpoint URI status is changed to offline on encountering a communication error. When the retry interval has passed and the business service attempts to process a new request, it tries to access this endpoint URI. If this attempt is successful, the endpoint URI is marked online again. If the attempt fails, the URI is marked offline again for the duration of the retry interval, and the cycle is repeated. This configuration is useful when a communication error is temporary and corrects itself. For example, when an endpoint becomes temporarily overloaded, communication errors occur but the endpoint reverts to normal operation without requiring manual intervention.
About Permanently Offline Endpoint URIs
When marked offline permanently, the endpoint URI status is changed to offline on encountering a communication error, and the status remains offline until you manually mark the endpoint URI online again. This configuration is useful for a case in which a communication error is caused by a problem with the endpoint URI that must be resolved by manual intervention.
If you want to keep non-responsive URIs offline until you take corrective action and then manually mark the URIs as online, do not provide a retry interval. For example, a zero retry interval indicates that the endpoint remains offline indefinitely.
Offline URIs in Clustered Environments
A communication error can occur due to network problem on a machine hosting a Managed Server. Such an event is interpreted by the business service as the endpoint URI being non-responsive (although the remote endpoint being accessed is responsive). A communication error can also occur because the endpoint URI is not responding.
In the first case, the URIs are marked offline on only one server (on the machine with network problems) and online on all the other servers in the cluster. An SLA alert condition based on Evaluate on any server generates an alert, but an alert condition based on Evaluate on all servers does not generate an alert.
                        
For the second case, the URI is marked offline on all the Managed Servers (one by one as each server tries to access that endpoint). As each Managed Server marks the endpoint URI offline, the alert rule condition based on Evaluate on any servers is met and an alert is generated. When the endpoint URI is marked offline on the last of the servers in the cluster domain, the alert rule condition based on Evaluate on all servers is also met and this alert is also generated.
                        
For a clustered domain:
- 
                              When the Server field is set to Cluster or to one of the Managed Servers, Onlinestatus denotes that all of the endpoint URIs are online across the cluster or on the selected Managed Server, respectively.
- 
                              When the Server field is set to Cluster or to one of the Managed Servers, Offlinestatus denotes that all of the endpoint URIs are offline across the cluster or on the selected Managed Server, respectively.
- 
                              When the Server field is set to Cluster, Partialstatus denotes that at least one of the endpoint URIs for the business service is offline on at least one of the servers, or that one of the endpoint URIs is offline on all the servers, but the other endpoint URIs for the same business service are still available on one or all the servers.
- 
                              When the Server field is set to one of the Managed Servers, Partialstatus denotes that at least one of the endpoint URIs for the business service is offline on the selected Managed Server.
Metrics for Monitoring Endpoint URIs
Fusion Middleware Control displays endpoint URI metrics so you can monitor the health of your business services. The JMX monitoring APIs also let you view endpoint URI metrics. For information on using Fusion Middleware Control, see Viewing Endpoint URI Metrics for a Business Service. For information on using the JMX monitoring APIs, see JMX Monitoring API.
In Fusion Middleware Control, the endpoint URI metrics are available on the Dashboard tab for the business service on the Service Bus Project page. The available metrics include the state, message and error counts, and response times. The following items describe the expected behavior when you monitor endpoint URIs on Fusion Middleware Control:
- 
                           Statistics are available only when you enable monitoring for a business service. 
- 
                           Renaming or moving a service resets the URI-level statistics. 
- 
                           Changing the aggregation interval resets all the URI-level statistics except the URI status. 
- 
                           Resetting statistics for the service (or resetting all statistics) resets all the URI-level statistics except the URI status. 
- 
                           Adding a new URI to an existing business service automatically initiates collecting the metrics for the new URI. 
Endpoint URI State
The State statistic on the business service Dashboard of Fusion Middleware Control indicates whether the endpoint URI is online or offline. You can also obtain the status of an endpoint URI using the JMX monitoring APIs. Table 15-1 describes the possible states of an endpoint URI.
Table 15-1 Status of Endpoint URIs
| Status | Description | 
|---|---|
| Online | Indicates that the URI is online on a given server. In a cluster it indicates that the URI is online for all servers. | 
| Offline | Indicates that the URI is offline on a given server. In a cluster it indicates that the URI is offline for all servers. | 
| Partial | Indicates that at least one server in the cluster reports a problem for that URI. This metric is available for clusters only. | 
Note:
When a URI is associated with more than one business service, the same endpoint URI can have a different status for each of the business services.
Endpoint URI Performance Metrics
The endpoint URI performance metrics provide information on how many messages have been processed by a given endpoint and how many failed and their response times. The following metrics help you monitor the health of the endpoint URIs:
- 
                              Message Count: The number of messages processed by the endpoint URI. 
- 
                              Error Count: The number of errors encountered by the endpoint URI. 
- 
                              Minimum Response Time: The minimum time (in milliseconds) that this service has taken to execute messages. 
- 
                              Maximum Response Time: The maximum time (in milliseconds) that this service has taken to execute messages. 
- 
                              Average Response Time: The average time (in milliseconds) that this service has taken to execute messages. 
Configuring Service Bus to Take Unresponsive Endpoint URIs Offline
You can configure Service Bus to automatically mark an unresponsive endpoint offline to prevent continued attempts to reach the endpoint URI.
This can be a temporary state, based on a retry interval, or the endpoint URI can be taken offline permanently or until you manually bring the endpoint URI back online. To do so, you must enable the Offline Endpoint URIs operational setting for the business service. The offline URI settings for the business service apply to all URIs in the service.
You can also use APIs to mark an offline endpoint URI as online. This is useful when the you have not enabled monitoring for a business service but you require to mark its endpoint URIs online. For more information, see com.bea.wli.monitoring.ServiceDomainMBean in the Java API Reference for Oracle Service Bus.
                     
To configure Service Bus to mark an unresponsive endpoint URI offline:
Marking an Endpoint URI Offline Manually
When you monitor a business service in Fusion Middleware Control, you can view metrics for its associated endpoint URIs.
If you notice any issues with a specific endpoint URI, you can mark the endpoint URI as offline to prevent repeated attempts to access that URI. When you take an endpoint URI offline manually, it remains offline until you manually bring it back up.
To mark an endpoint URI offline manually:
Marking an Offline URI as Online
When an endpoint URI is marked offline, either automatically by Service Bus or manually by an administrator, you can manually mark the endpoint URI as back online once you have taken steps to correct the error that caused the URI to be non-responsive.
When you mark an endpoint URI as back online, Service Bus continues processing according to the business service endpoint URI configuration.
To mark an endpoint URI online manually:
Viewing Endpoint URI Metrics for a Business Service
Service Bus collects information about how each endpoint URI is processing messages. You can view message counts, error counts, and the minimum, maximum, and average response times. The business service Dashboard also shows whether the endpoint URI is online or offline.
To view endpoint metrics for a business service:
Creating Alerts Based on Endpoint URI Metrics
If an endpoint URI is not accessible, the business service trying to access it receives a communication error.
In addition to configuring a business service to take a non responsive URI offline, as described in Configuring Service Bus to Take Unresponsive Endpoint URIs Offline, you can raise an alert when a system encounters a non-responsive URI by configuring SLA alert rules for a business service based on the endpoint URI status.
About Creating an SLA Alert Based on Endpoint URI Status
When you create an SLA alert based on a business service's endpoint URI status, an alert is generated when any endpoint URI or all endpoint URIs change state from online to offline, or from offline to online. For example, consider a business service for which two alert rules are configured, one based on All URIs offline = True condition and another on Any URI offline = True condition. If an alert based on All URIs offline = True condition is generated then it signifies a severe problem because all requests to this service are likely to fail until the situation is resolved. However, if an alert based on Any URI offline = True is generated, it implies that the other endpoint URIs are responsive and subsequent requests may not fail.
                     
All alert rules are independently evaluated. If alerts based on both (any or all URI) clauses have been configured for the same business service, it is likely that both alerts are generated simultaneously when the last endpoint URI is marked offline.If a business service has only one URI, the All URIs offline = True and Any URI offline = True clauses mean the same thing and so they behave in an identical manner.
                     
The evaluation of an alert rule condition based on a transition from offline to online behaves in a similar fashion except that it tracks any or all endpoint URIs being marked back to online state.
Creating an SLA Alert Based on Endpoint URI Status
You can create an alert rule based on an endpoint URI's status.
To create an SLA alert based on endpoint URI status:
Note:
To ensure that you do not miss any alerts triggered due to frequent changes in the status of the URI, Oracle recommends that you set the aggregation interval for alert rules based on the status of the URI to one minute. For more information on aggregation intervals, see Introduction to Aggregation Intervals.