Health Check Management

Learn how to understand and use health status indicators to report on the general health of your Load Balancer resources and their components.

A health check is a test to confirm the availability of backend servers. A health check can be a request or a connection attempt. Based on a time interval you specify, the load balancer applies the health check policy to continuously monitor backend servers. If a server fails the health check, the load balancer takes the server temporarily out of rotation. If the server later passes the health check, the load balancer returns it to the rotation.

You configure your health check policy when you create a backend set. You can configure TCP-level or HTTP-level health checks for your backend servers.

  • TCP-level health checks attempt to make a TCP connection with the backend servers and validate the response based on the connection status.

  • HTTP-level health checks send requests to the backend servers at a specific URI and validate the response based on the status code or entity data (body) returned.

The service provides application-specific health check capabilities to help you increase availability and reduce your application maintenance window.

The backend set's Details page provides the same Overall Health status indicator found in the load balancer's list of backend sets. It also includes counters for the Backend Health status values reported by the backend set's backend servers.

The health status counter badges indicate the following:

  • The number of child entities reporting the indicated health status level.

  • If a counter corresponds to the overall health, the badge has a fill color.

  • If a counter has a zero value, the badge has a light gray outline and no fill color.

Health Status Indicators

Learn about the different health status indicators for Load Balancer resources.

The Load Balancing service provides health status indicators that use your health check policies to report on the general health of your load balancers and their components.

The following table provides the general meaning of each level:

Level

Color

Description

Critical

Red

Some or all reporting entities require immediate attention.

The resource is not functioning or unexpected failure is imminent.

Warning

Yellow

Some reporting entities require attention.

The resource is not functioning at peak efficiency or the resource is incomplete and requires further work.

Incomplete

Yellow

The load balancer does not have any backend sets configured or backend sets exist that contain no attached backend servers.

Pending

Yellow

The health status cannot be determined.

The resource is not responding or is in transition and might resolve to another status over time.

OK

Green

No attention required.

The resource is functioning as expected.

The precise meaning of each level differs among the following components:

Understanding Health Issues

Learn more about how health issues affect a Load Balancer resource.

At the highest level, load balancer health reflects the health of its components. The health status indicators provide information you might need to drill down and investigate an existing issue. Some common issues that the health status indicators can help you detect and correct include:

A health check is misconfigured.

In this case, all the backend servers for one or more of the affected listeners report as unhealthy. If your investigation finds that the backend servers do not have problems, then a backend set probably includes a misconfigured health check.

A listener is misconfigured.

All the backend server health status indicators report OK, but the load balancer does not pass traffic on a listener.

The listener might be configured to:

  • Listen on the wrong port.

  • Use the wrong protocol.

  • Use the wrong policy.

If your investigation shows that the listener is not at fault, check the security list configuration.

A security rule is misconfigured.

Health status indicators help you diagnose two cases of misconfigured security rules:

  • All entity health status indicators report OK, but traffic does not flow (as with misconfigured listeners). If the listener is not at fault, check the security rule configuration.

  • All entity health statuses report as unhealthy. You have checked your health check configuration and your services run properly on your backend servers.

    In this case, your security rules might not include the IP range for the source of the health check requests. You can find the health check source IP on the Details page for each backend server. You can also use the API to find the IP in the sourceIpAddress field of the HealthCheckResult object.

    Note

    Source IP

    The source IP for health check requests comes from a compute instance managed by the Load Balancing service.

One or more of the backend servers reports as unhealthy.

A backend server might be unhealthy or the health check might be misconfigured. To see the corresponding error code, check the status field on the backend server's Details page. You can also use the API to find the error code in the healthCheckStatus field of the HealthCheckResult object.

Other cases in which health status might prove helpful include:

  • VCN network security groups or security lists block traffic.

  • Compute instances have misconfigured route tables.

Health status is updated every three minutes. No finer granularity is available.

Health status does not provide historical health data.

Common Side Effects of Health Check Misconfiguration

Learn about the different health status indicators for Load Balancer resources.

The following are common side effects of health check misconfiguration, and can be used to troubleshoot issues.

  • Wrong Port

    In this scenario, all backend servers are reported as unhealthy. If the backend servers do not have any problems, you might have made a mistake setting the port. The port must be a port that is listening and has allowed traffic on the backend.

    OCI Logging Error: errno:EHOSTUNREACH, syscall:connect

  • Wrong Patch

    In this scenario, all the backend servers are reported as unhealthy. If the backend servers do not have any problems, you might have made a mistake setting the path for the HTTP health check it needs to match an actual application on the backend. In this scenario, you can use the curl utility to test from a system in the same network. For example: $ curl -i http://backend_ip_address/health

    You receive the configured status code in the response OCI Logging Error: "msg":"invalid statusCode","statusCode":404,"expected":"200".

  • Wrong Protocol

    In this scenario, all the backend servers are reported as unhealthy. If the backend servers do not have any problems, you might have made a mistake setting the protocol it needs to match the protocol that is listening on the backend. For example: We only support TCP and HTTP health checks. If your backend is using HTTPS, then you would need to use TCP as the protocol.

    OCI Logging Error: code:EPROTO, errno:EPROTO

  • Wrong Status Code

    In this scenario, all the backend servers are reported as unhealthy. If the backend servers do not have any problems, for an HTTP health check you might have made a mistake setting the status code to match the actual status code being returned from the backend. A common scenario is when a backend returns a 302 status code but you are expecting a 200 status code. This result is likely the backend sending you to a login page or another location on the server. In this scenario, you can either fix the backend to return the expected code or use 302 in your health check configuration.

    OCI Logging Error: msg:invalid statusCode, statusCode:nnn,expected:200 where nnn to be the status code that is returned.

  • Wrong Regex Pattern

    All the backend servers report as unhealthy. If the backend servers do not have any problems, you might have made a mistake setting an incorrect regex pattern consistent with the body, or the backend is not returning the expected body. In this scenario, you can either change the backend to match the pattern or correct the pattern to match the backend. The following are some specific pattern examples.
    • Any Content - .*

    • A page returning the value Status:OK: - Status:OK:.*

    • OCI Logging Error: response match result: failed

  • Misconfigured Network Security Groups, Security Lists. or Local Firewall

All or some of the backend servers report as unhealthy. If the backend servers do not have any problems, then you might have improperly configured either the network security groups, security lists, or local firewalls (such as firewalld, iptables, or SELinux. In this scenario, you can use either the curl or netcat utilities to test from a system that belongs to the same subnet and network security group as your load balancer instance HTTP. For example: $ curl -i http://backend_ip_address/health TCP and nc -zvw3 backend_ip_address 443.

You can check your local firewall by using the following command: firewall-cmd --list-all --zone=public.. If your firewall is missing the expected rules, then you can use a command set like this to add the service (this example is for HTTP port 80):

  • firewall-cmd --zone=public --add-service=http

  • firewall-cmd --zone=public --permanent --add-service=http

Health Check Best Practices

Learn about health check best practices for a Load Balancer resource.

Configure your health check protocol to match your application or service. If you run an HTTP service, then configure an HTTP-level health check. If you run a TCP-level health check against an HTTP service, then you might not get an accurate response. The TCP handshake can succeed and indicate that the service is up even when the HTTP service is incorrectly configured or having other issues. Although the health check appears good, you might experience transaction failures.

For example:

  • The backend HTTP service has issues when communicating with the health check URL and the health check URL returns 5nn messages. An HTTP health check catches the message from the health check URL and marks the service as down. In this case, a TCP health check handshake succeeds and marks the service as healthy, even though the HTTP service might not be usable.

  • The backend HTTP service responds with 4nn messages because of authorization issues or no configured content. A TCP health check does not catch these errors.

Creating a Custom Health Check Page

Create a custom health check page for Load Balancer resources.

In many scenarios, you might want to expose your own custom health check page to do a more thorough check. One example scenario is to use the flask application, as in the following example, rather than relying on your existing application.https://pypi.org/project/py-healthcheck/

import tornado.web
from healthcheck import TornadoHandler, HealthCheck, EnvironmentDump
# add your own check function to the healthcheck
def redis_available():
client = _redis_client()
info = client.info()
return True, "Redis Test Pass"
health = HealthCheck(checkers=[redis_available])
app = tornado.web.Application([
("/healthcheck", TornadoHandler, dict(checker=health)),
])
In the preceding example, the test page is doing more than just ensuring the HTTP application is listening. This example checks for a redis client and waits for a response to ensure that the full application is healthy before returning a 200 status code. Some other command examples would be to check for disk space or the availability of an upstream dependency. In your health check configuration, specify the following:
  • /healthcheck as your path

  • flask default 5000 as port

  • 200 as status code

Getting Health Check Policy Details

Get the details of a health check policy for a Load Balancer resource and backend set.

Use one of the following methods to get the details of a health check policy for a selected load balancer and backend set.

To get the details of a health check policy using the Console

Use the OCI Console to get the details of a health check policy for a Load Balancer resource and backend set.

  1. Open the navigation menu, click Networking, and then click Load Balancers.

  2. Select the Compartment from the list.

    All load balancers and network load balancers in that compartment are listed in tabular form.

  3. (optional) Select a State from the list to limit the load balancers displayed to that state.

  4. (optional) Uncheck Network Load Balancer under Type to only display load balancers.

  5. Select the load balancer for whose health check policies whose details you want to get.

    The Load Balancer Details dialog box appears.

  6. Click Backend Sets under Resources.

    The Backend Sets list appears. All backend sets are listed in tabular form.

  7. Click the backend set whose details you want to get.

    The Backend Set Details dialog box appears.

  8. Click Update Health Check.

    Alternatively, click the Actions icon (Action icon) for the backend set whose health check you want to update, and then click Update Health Check.

    The Update Health Check dialog box appears.

To get the details of a health check policy using the CLI

Use the command line interface (CLI) to get the details of a health check policy for a Load Balancer resource and backend set.

Enter the following command:

oci lb health-checker get --backend-set-name backend_set_name --load-balancer-id load_balancer_id [OPTIONS]

See the CLI online help for a list of options:

oci lb health-checker get --help

See oci lb health-checker get for a complete description of the command.

To get the details of a health check policy using the API

Use the API to get the details of a health check policy for a Load Balancer resource and backend set.

Run the GetHealthChecker method to display the details of a health check policy of a backend set for a load balancer. See GetHealthChecker for a complete description.

Editing Health Check Policies

Update the health check policy for a Load Balancer resource and backend set.

Use one of the following methods to edit and update the health check policy for a selected load balancer and backend set.

To edit a health check policy using the Console

Use the OCI Console to update a health check policy of a backend set for a Load Balancer resource.

  1. Open the navigation menu, click Networking, and then click Load Balancers.

  2. Select the Compartment from the list.

    All load balancers and network load balancers in that compartment are listed in tabular form.

  3. (optional) Select a State from the list to limit the load balancers displayed to that state.

  4. (optional) Uncheck Network Load Balancer under Type to only display load balancers.

  5. Select the load balancer for whose health check policies you want to edit.

    The Load Balancer Details dialog box appears.

  6. Click Backend Sets under Resources.

    The Backend Sets list appears. All backend sets are listed in tabular form.

  7. Click the backend set whose details you want to get.

    The Backend Set Details dialog box appears.

  8. Click Update Health Check.

    Alternatively, click the Actions icon (Action icon) for the backend set whose health check you want to update, and then click Update Health Check.

    The Update Health Check dialog box appears.

  9. Update any of the following:

    • Protocol: Required. Specify the protocol:

      • HTTP

      • TCP

      Important

      Configure your health check protocol to match your application or service. See Health Check Management for more information.

    • Port: Optional. Specify the backend server port against which to run the health check.

      Tip

      You can enter the value '0' to have the health check use the backend server's traffic port.

    • Interval in MS: Optional. Specify how frequently to run the health check, in milliseconds. The default is 10000 (10 seconds).

    • Timeout in MS: Optional. Specify the maximum time in milliseconds to wait for a reply to a health check. A health check is successful only if a reply returns within this timeout period. The default is 3000 (3 seconds).

    • Number of retries: Optional. Specify the number of retries to attempt before a backend server is considered "unhealthy." This number also applies when recovering a server to the "healthy" state. The default is 3.

    • Status Code: (HTTP only) Optional. Specify the status code a healthy backend server must return.

    • URL Path (URI): (HTTP only) Required. Specify a URL endpoint against which to run the health check.

    • Response Body Regex: (HTTP only) Optional. Provide a regular expression for parsing the response body from the backend server.

  10. Click Save Changes.

To edit a health check policy using the CLI

Use the command line interface (CLI) to update a health check policy of a backend set for a Load Balancer resource.

Enter the following command:

oci lb health-checker update --backend-set-name backend_set_name --load-balancer-id load_balancer_id --interval-in-millis interval_in_millis --port port --protocol protocol --response-body-regex response_body_regex --retries retries  --return-code return_code --timeout-in-millis timeout_in_millis [OPTIONS]

See the CLI online help for a list of options:

oci lb health-checker update --help

See oci lb health-checker update for a complete description of the command.

To edit a health check policy using the API

Use the API to update a health check policy of a backend set for a network load balancer for a Load Balancer resource.

Run the UpdateHealthChecker method to edit a health check policy of a backend set for a network load balancer for a load balancer. See UpdateHealthChecker for a complete description.