14 Monitoring and Managing Endpoint URIs for Business Services

This chapter describes how to manage endpoint URIs in business services, including configuring retries, marking non-responsive endpoints as offline, viewing endpoint metrics, and triggering alerts based on endpoint status.

This chapter contains the following sections:

14.1 About Endpoint URI Management

In the runtime, you can monitor metrics for each endpoint URI to ensure they are all performing as expected.

When you notice issues with an endpoint URI, you can mark the URI as being offline to avoid repeated attempts at accessing the endpoint URI. You can alternatively configure the business service to mark non-responsive URIs as offline.

14.1.1 About Endpoint URIs

An endpoint URI is the URL of an external service that is accessed by a business service. In Service Bus, you must define at least one endpoint URI for a business service. When you define multiple endpoint URIs for a business service, the load balancing algorithm you define controls the manner in which a business service tires to access the endpoint URI. A business service can use one of the following load balancing algorithms:

  • Round robin

  • Random

  • Random-weighted

  • None

When you configure a business service, you can also configure how retries are handled. For more information, see "About Business Service URI Retries" in Developing Services with Oracle Service Bus.

14.1.2 Offline and Online Endpoint URIs

You can configure a business service to mark non-responsive URIs offline, which prevents a business service from repeatedly attempting to access a non-responsive URI and therefore avoids the communication errors caused by trying to access a non-responsive URI. If Service Bus automatically marks an endpoint URI offline, Service Bus can bring it back online after a time period you specify, or Service Bus can keep it offline until you change the status manually. You can manually change the status of an endpoint URI to online or offline using Fusion Middleware Control or using the public APIs. When you mark an endpoint URI online in a cluster domain, it is marked online on all the Managed Servers.

Service Bus automatically marks an endpoint URI online when any of the following occur:

  • You add the endpoint URI to a business service.

  • You restart a server.

  • You enable a disabled service.

  • You rename or move a service.

  • A business service is able to successfully access the URI after the retry interval you have configured is past.

When you configure a business service to mark non-responsive URIs offline automatically, you can make this state temporary or permanent (or until you manually update the status).

For more information, see Configuring Operational and Global Settings.

14.1.2.1 About Temporarily Offline Endpoint URIs

Mark an endpoint URI offline temporarily if you want the business service to automatically retry the same endpoint after a short interval of time; mark it offline permanently if you want the business service to treat the endpoint URI as offline until it is reset manually.

When marked offline temporarily, the endpoint URI status is changed to offline on encountering a communication error. When the retry interval has passed and the business service attempts to process a new request, it tries to access this endpoint URI. If this attempt is successful, the endpoint URI is marked online again. If the attempt fails, the URI is marked offline again for the duration of the retry interval, and the cycle is repeated. This configuration is useful when a communication error is temporary and corrects itself. For example, when an endpoint becomes temporarily overloaded, communication errors occur but the endpoint reverts to normal operation without requiring manual intervention.

14.1.2.2 About Permanently Offline Endpoint URIs

When marked offline permanently, the endpoint URI status is changed to offline on encountering a communication error, and the status remains offline until you manually mark the endpoint URI online again. This configuration is useful for a case in which a communication error is caused by a problem with the endpoint URI that must be resolved by manual intervention.

If you want to keep non-responsive URIs offline until you take corrective action and then manually mark the URIs as online, do not provide a retry interval. For example, a zero retry interval indicates that the endpoint remains offline indefinitely.

14.1.2.3 Offline URIs in Clustered Environments

A communication error can occur due to network problem on a machine hosting a Managed Server. Such an event is interpreted by the business service as the endpoint URI being non-responsive (although the remote endpoint being accessed is responsive). A communication error can also occur because the endpoint URI is not responding.

In the first case, the URIs are marked offline on only one server (on the machine with network problems) and online on all the other servers in the cluster. An SLA alert condition based on Evaluate on any server generates an alert, but an alert condition based on Evaluate on all servers does not generate an alert.

For the second case, the URI is marked offline on all the Managed Servers (one by one as each server tries to access that endpoint). As each Managed Server marks the endpoint URI offline, the alert rule condition based on Evaluate on any servers is met and an alert is generated. When the endpoint URI is marked offline on the last of the servers in the cluster domain, the alert rule condition based on Evaluate on all servers is also met and this alert is also generated.

For a clustered domain:

  • When the Server field is set to Cluster or to one of the Managed Servers, Online status denotes that all of the endpoint URIs are online across the cluster or on the selected Managed Server, respectively.

  • When the Server field is set to Cluster or to one of the Managed Servers, Offline status denotes that all of the endpoint URIs are offline across the cluster or on the selected Managed Server, respectively.

  • When the Server field is set to Cluster, Partial status denotes that at least one of the endpoint URIs for the business service is offline on at least one of the servers, or that one of the endpoint URIs is offline on all the servers, but the other endpoint URIs for the same business service are still available on one or all the servers.

  • When the Server field is set to one of the Managed Servers, Partial status denotes that at least one of the endpoint URIs for the business service is offline on the selected Managed Server.

14.1.3 Metrics for Monitoring Endpoint URIs

Fusion Middleware Control displays endpoint URI metrics so you can monitor the health of your business services. The JMX monitoring APIs also let you view endpoint URI metrics. For information on using Fusion Middleware Control, see Viewing Endpoint URI Metrics for a Business Service. For information on using the JMX monitoring APIs, see JMX Monitoring API.

In Fusion Middleware Control, the endpoint URI metrics are available on the Dashboard tab for the business service on the Service Bus Project page. The available metrics include the state, message and error counts, and response times. The following items describe the expected behavior when you monitor endpoint URIs on Fusion Middleware Control:

  • Statistics are available only when you enable monitoring for a business service.

  • Renaming or moving a service resets the URI-level statistics.

  • Changing the aggregation interval resets all the URI-level statistics except the URI status.

  • Resetting statistics for the service (or resetting all statistics) resets all the URI-level statistics except the URI status.

  • Adding a new URI to an existing business service automatically initiates collecting the metrics for the new URI.

14.1.3.1 Endpoint URI State

The State statistic on the business service Dashboard of Fusion Middleware Control indicates whether the endpoint URI is online or offline. You can also obtain the status of an endpoint URI using the JMX monitoring APIs. Table 14-1 describes the possible states of an endpoint URI.

Table 14-1 Status of Endpoint URIs

Status Description

Online

Indicates that the URI is online on a given server. In a cluster it indicates that the URI is online for all servers.

Offline

Indicates that the URI is offline on a given server. In a cluster it indicates that the URI is offline for all servers.

Partial

Indicates that at least one server in the cluster reports a problem for that URI. This metric is available for clusters only.

Note:

When a URI is associated with more than one business service, the same endpoint URI can have a different status for each of the business services.

14.1.3.2 Endpoint URI Performance Metrics

The endpoint URI performance metrics provide information on how many messages have been processed by a given endpoint and how many failed and their response times. The following metrics help you monitor the health of the endpoint URIs:

  • Message Count: The number of messages processed by the endpoint URI.

  • Error Count: The number of errors encountered by the endpoint URI.

  • Minimum Response Time: The minimum time (in milliseconds) that this service has taken to execute messages.

  • Maximum Response Time: The maximum time (in milliseconds) that this service has taken to execute messages.

  • Average Response Time: The average time (in milliseconds) that this service has taken to execute messages.

14.2 Configuring Service Bus to Take Unresponsive Endpoint URIs Offline

You can configure Service Bus to automatically mark an unresponsive endpoint offline to prevent continued attempts to reach the endpoint URI.

This can be a temporary state, based on a retry interval, or the endpoint URI can be taken offline permanently or until you manually bring the endpoint URI back online. To do so, you must enable the Offline Endpoint URIs operational setting for the business service. The offline URI settings for the business service apply to all URIs in the service.

You can also use APIs to mark an offline endpoint URI as online. This is useful when the you have not enabled monitoring for a business service but you require to mark its endpoint URIs online. For more information, see com.bea.wli.monitoring.ServiceDomainMBean in the Java API Reference for Oracle Service Bus.

To configure Service Bus to mark an unresponsive endpoint URI offline:

  1. In Fusion Middleware Control Target Navigator, do one of the following:
    • Expand SOA, expand service-bus, and then click the project containing the business service whose URI you want to modify. On the Service Health tab, perform a search for and select the business service.

    • Expand SOA, select service-bus, and then click the Operations tab. Perform a search for and select the business service whose URI you want to modify.

    The Dashboard for the selected business service appears.

  2. Click the Properties tab.
  3. Under General Settings, select Offline Endpoint URIs.

    This configures Service Bus to mark the business service's endpoint URIs offline when they are not responding.

  4. Do one of the following:
    • To have Service Bus mark the endpoint URI offline temporarily, use the hours, mins, and secs fields to specify a retry interval. This is the time Service Bus will wait before attempting to access the same endpoint URI for subsequent message processing.

    • To have Service Bus mark the endpoint URI offline permanently (or until manual intervention), set the Retry Interval to 0 hours 0 mins 0 secs.

    Note:

    When configure the endpoint URIs to be marked offline temporarily, the URI is kept offline for the specified time interval and then retried. If the endpoint responds, the URI becomes online again, or else it remains offline and the process repeats itself.

  5. To save your changes to the runtime, click Apply.

14.3 Marking an Endpoint URI Offline Manually

When you monitor a business service in Fusion Middleware Control, you can view metrics for its associated endpoint URIs.

If you notice any issues with a specific endpoint URI, you can mark the endpoint URI as offline to prevent repeated attempts to access that URI. When you take an endpoint URI offline manually, it remains offline until you manually bring it back up.

To mark an endpoint URI offline manually:

  1. In Fusion Middleware Control Target Navigator, do one of the following:
    • Expand SOA, expand service-bus, and then click the project containing the business service whose URI you want to modify. On the Service Health tab, perform a search for and select the business service.

    • Expand SOA, select service-bus, and then click the Operations tab. Perform a search for and select the business service whose URI you want to modify.

    The Dashboard for the selected business service appears.

  2. Scroll down to the Endpoint URIs section.
  3. Select the online endpoint URI you want to mark offline, and click Toggle URI State.

    The State column for the URI changes to Offline.

14.4 Marking an Offline URI as Online

When an endpoint URI is marked offline, either automatically by Service Bus or manually by an administrator, you can manually mark the endpoint URI as back online once you have taken steps to correct the error that caused the URI to be non-responsive.

When you mark an endpoint URI as back online, Service Bus continues processing according to the business service endpoint URI configuration.

To mark an endpoint URI online manually:

  1. In Fusion Middleware Control Target Navigator, do one of the following:
    • Expand SOA, expand service-bus, and then click the project containing the business service whose URI you want to modify. On the Service Health tab, perform a search for and select the business service.

    • Expand SOA, select service-bus, and then click the Operations tab. Perform a search for and select the business service whose URI you want to modify.

    The Dashboard for the selected business service appears.

  2. Scroll down to the Endpoint URIs section.
  3. Select the online endpoint URI you want to mark offline, and click Toggle URI State.

    The State column for the URI changes to Online.

  4. To have Service Bus bring the endpoint URI back online after it is marked offline, follow the steps under Configuring Service Bus to Take Unresponsive Endpoint URIs Offline.

14.5 Viewing Endpoint URI Metrics for a Business Service

Service Bus collects information about how each endpoint URI is processing messages. You can view message counts, error counts, and the minimum, maximum, and average response times. The business service Dashboard also shows whether the endpoint URI is online or offline.

To view endpoint metrics for a business service:

  1. In Fusion Middleware Control Target Navigator, do one of the following:
    • Expand SOA, expand service-bus, and then click the project containing the business service whose URI you want to modify. On the Service Health tab, perform a search for and select the business service.

    • Expand SOA, select service-bus, and then click the Operations tab. Perform a search for and select the business service whose URI you want to modify.

    The Dashboard for the selected business service appears.

  2. In the Display Statistics field, select whether to view the statistics for the current aggregation interval or the statistics since the last reset.
  3. Scroll down to the Endpoint URIs section.
  4. View the metrics for each endpoint URI in the table.

    For more information about the metrics displayed, see Metrics for Monitoring Endpoint URIs and the online help for Fusion Middleware Control.

  5. To change the status of an endpoint URI, see Marking an Endpoint URI Offline Manually or Marking an Offline URI as Online.

14.6 Creating Alerts Based on Endpoint URI Metrics

If an endpoint URI is not accessible, the business service trying to access it receives a communication error.

In addition to configuring a business service to take a non responsive URI offline, as described in Configuring Service Bus to Take Unresponsive Endpoint URIs Offline, you can raise an alert when a system encounters a non-responsive URI by configuring SLA alert rules for a business service based on the endpoint URI status.

14.6.1 About Creating an SLA Alert Based on Endpoint URI Status

When you create an SLA alert based on a business service's endpoint URI status, an alert is generated when any endpoint URI or all endpoint URIs change state from online to offline, or from offline to online. For example, consider a business service for which two alert rules are configured, one based on All URIs offline = True condition and another on Any URI offline = True condition. If an alert based on All URIs offline = True condition is generated then it signifies a severe problem because all requests to this service are likely to fail until the situation is resolved. However, if an alert based on Any URI offline = True is generated, it implies that the other endpoint URIs are responsive and subsequent requests may not fail.

All alert rules are independently evaluated. If alerts based on both (any or all URI) clauses have been configured for the same business service, it is likely that both alerts are generated simultaneously when the last endpoint URI is marked offline.If a business service has only one URI, the All URIs offline = True and Any URI offline = True clauses mean the same thing and so they behave in an identical manner.

The evaluation of an alert rule condition based on a transition from offline to online behaves in a similar fashion except that it tracks any or all endpoint URIs being marked back to online state.

14.6.2 Creating an SLA Alert Based on Endpoint URI Status

You can create an alert rule based on an endpoint URI's status.

To create an SLA alert based on endpoint URI status:

  1. Create an SLA alert rule for the business service as described in Configuring SLA Alert Rule Properties.
  2. On the Rule Condition page of the Create SLA Alert Rule wizard, select the time period for the Condition Aggregation Interval.

    For more information, see Aggregation Intervals.

  3. If there is no template row in the table, click Add a New Condition above the Condition Builder table.

    A new row appears in the table.

  4. In the first field, select Status.
  5. In the next field, select one of the following to indicate the status condition that will generate an alert:
    • All URIs offline

    • All URIs online

    • Any URIs offline

    • Any URIs online

  6. In the next field, the = operator is selected and is the only available option.
  7. In the next field, select either True or False, depending on how you want to evaluate the condition.
  8. In the last field, select on of the following:
    • Evaluate on all servers: With this option, the rule evaluates to true only if the condition is met on all servers.

    • Evaluate on any server: With this option, the rule evaluates to true if the condition is met on any servers.

  9. To the left of the row, click Update the Condition.
  10. Click Create.

    The new alert rule appears in the summary table.

Note:

To ensure that you do not miss any alerts triggered due to frequent changes in the status of the URI, Oracle recommends that you set the aggregation interval for alert rules based on the status of the URI to one minute. For more information on aggregation intervals, see Introduction to Aggregation Intervals.

14.6.3 Configuring an Alert Rule Based on Endpoint URI Statistics

You can create an alert rule based on an endpoint URI's message count, error count, or response time.

To configure an alert rule based on endpoint URI statistics:

  1. Create an SLA alert rule for the business service as described in Configuring SLA Alert Rule Properties.
  2. On the Rule Condition page of the Create SLA Alert Rule wizard, select the time period for the Condition Aggregation Interval.

    For more information, see Aggregation Intervals.

  3. If there is no template row in the table, click Add a New Condition above the Condition Builder table.

    A new row appears in the table.

  4. In the first field, select one of the following:
    • Count: To base the condition on the endpoint URI's message count or error count.

    • Minimum: To base the condition on the endpoint URI's minimum response time.

    • Maximum: To base the condition on the endpoint URI's maximum response time.

    • Average: To base the condition on the endpoint URI's average response time.

  5. In the next field, select the URI for which you are creating the condition.
  6. In the next field, select a comparison operator: =, !=, > or <.
  7. In the next field, enter the value to compare the actual statistic against.
  8. To the left of the row, click Update the Condition.
  9. Click Create.

    The new alert rule appears in the summary table.