Editing Health Check Policies

This topic describes how to modify health check policies for a backend set.

Required IAM Policy

To use Oracle Cloud Infrastructure, you must be granted security access in a policy  by an administrator. This access is required whether you're using the Console or the REST API with an SDK, CLI, or other tool. If you get a message that you don’t have permission or are unauthorized, verify with your administrator what type of access you have and which compartment  you should work in.

For administrators: For a typical policy that gives access to load balancers and their components, see Let network admins manage load balancers.

Also, be aware that a policy statement with inspect load-balancers gives the specified group the ability to see all information about the load balancers. For more information, see Details for Load Balancing.

If you're new to policies, see Getting Started with Policies and Common Policies.

Working with Health Check Policies

A health check is a test to confirm the availability of backend servers. A health check can be a request or a connection attempt. Based on a time interval you specify, the load balancer applies the health check policy to continuously monitor backend servers. If a server fails the health check, the load balancer takes the server temporarily out of rotation. If the server subsequently passes the health check, the load balancer returns it to the rotation.

You configure your health check policy when you create a backend set. You can configure TCP-level or HTTP-level health checks for your backend servers.

  • TCP-level health checks attempt to make a TCP connection with the backend servers and validate the response based on the connection status.
  • HTTP-level health checks send requests to the backend servers at a specific URI and validate the response based on the status code or entity data (body) returned.

The service provides application-specific health check capabilities to help you increase availability and reduce your application maintenance window.

Important

Configure your health check protocol to match your application or service.

If you run an HTTP service, be sure to configure an HTTP-level health check. If you run a TCP-level health check against an HTTP service, you might not get an accurate response. The TCP handshake can succeed and indicate that the service is up even when the HTTP service is incorrectly configured or having other issues. Although the health check appears good, customers might experience transaction failures. For example:

  • The backend HTTP service has issues when talking to the health check URL and the health check URL returns 5XX messages. An HTTP health check catches the message from the health check URL and marks the service as down. In this case, a TCP health check handshake succeeds and marks the service as healthy, even though the HTTP service is may not be usable.

  • The backend HTTP service responds with 4XX messages because of authorization issues or no configured content. A TCP health check does not catch these errors.

Health Status

The Load Balancing service provides health status indicators that use your health check policies to report on the general health of your load balancers and their components. You can see health status indicators on the Console List and Details pages for load balancers, backend sets, and backend servers. You also can use the Load Balancing API to retrieve this information.

Health status indicators have four levels. The following table provides the general meaning of each level:

Level Color Description
OK Green

No attention required.

The resource is functioning as expected.

Warning Yellow

Some reporting entities require attention.

The resource is not functioning at peak efficiency or the resource is incomplete and requires further work.

Critical Red

Some or all reporting entities require immediate attention.

The resource is not functioning or unexpected failure is imminent.

Unknown Gray

Health status cannot be determined.

The resource is not responding or is in transition and might resolve to another status over time.

The precise meaning of each level differs among the following components:

Using Health Status

At the highest level, load balancer health reflects the health of its components. The health status indicators provide information you might need to drill down and investigate an existing issue. Some common issues that the health status indicators can help you detect and correct include:

A health check is misconfigured.

In this case, all the backend servers for one or more of the affected listeners report as unhealthy. If your investigation finds that the backend servers do not have problems, then a backend set probably includes a misconfigured health check.

A listener is misconfigured.

All the backend server health status indicators report OK, but the load balancer does not pass traffic on a listener.

The listener might be configured to:

  • Listen on the wrong port.
  • Use the wrong protocol.
  • Use the wrong policy.

If your investigation shows that the listener is not at fault, check the security list configuration.

A security rule is misconfigured.

Health status indicators help you diagnose two cases of misconfigured security rules:

  • All entity health status indicators report OK, but traffic does not flow (as with misconfigured listeners). If the listener is not at fault, check the security rule configuration.
  • All entity health statuses report as unhealthy. You have checked your health check configuration and your services run properly on your backend servers.

    In this case, your security rules might not include the IP range for the source of the health check requests. You can find the health check source IP on the Details page for each backend server. You can also use the API to find the IP in the sourceIpAddress field of the HealthCheckResult object.

    Note

    Source IP

    The source IP for health check requests comes from a Compute instance managed by the Load Balancing service.

One or more of the backend servers reports as unhealthy.

A backend server might be unhealthy or the health check might be misconfigured. To see the corresponding error code, check the status field on the backend server's Details page. You can also use the API to find the error code in the healthCheckStatus field of the HealthCheckResult object.

Other cases in which health status might prove helpful include:

Health status is updated every three minutes. No finer granularity is available.

Health status does not provide historical health data.

Health Check Best Practices

Configure your health check protocol to match your application or service. If you run an HTTP service, be sure to configure an HTTP-level health check. If you run a TCP-level health check against an HTTP service, you might not get an accurate response. The TCP handshake can succeed and indicate that the service is up even when the HTTP service is incorrectly configured or having other issues. Although the health check appears good, customers might experience transaction failures. For example:
  • The backend HTTP service has issues when talking to the health check URL and the health check URL returns 5XX messages. An HTTP health check catches the message from the health check URL and marks the service as down. In this case, a TCP health check handshake succeeds and marks the service as healthy, even though the HTTP service may not be usable.
  • The backend HTTP service responds with 4XX messages because of authorization issues or no configured content. A TCP health check does not catch these errors.

Configure the Health Check

To configure the health check, use the following procedure.
  1. Open the navigation menu. Under the Core Infrastructure group, go to Networking and click Load Balancers.
  2. Click the name of the Compartment that contains the load balancer you want to modify, and then click the load balancer's name.
  3. In the Resources menu, click Backend Sets, and then click the name of the backend set you want to modify.
  4. Click Update Health Check.
  5. In the Health Check section, specify the test parameters to confirm the health of backend servers.
    All parameters are required when updating an existing health check policy.
    • Protocol: Required. Specify the protocol to use, either HTTP or TCP.
      Note

      Configure your health check protocol to match your application or service.
    • Port: Required. Specify the backend server port against which to run the health check.
      Note

      You can enter the value '0' to have the health check use the backend server's traffic port.
    • URL Path (URI): (HTTP only) Required. Specify a URL endpoint against which to run the health check.

      Example: /health (This is a commonly used path for a health check application).

    • Interval in ms: Required. Specify how frequently to run the health check, in milliseconds. Default is 10000 (10 seconds).
    • Timeout in ms: Required. Specify the maximum time in milliseconds to wait for a reply to a health check. A health check is successful only if a reply returns within this timeout period. Default is 3000 (3 seconds).
      Note

      Enter a timeout value that is smaller than the interval value to ensure the health check works correctly.
    • Number of retries: Required. Specify the number of retries to attempt before a backend server is considered "unhealthy." This number also applies when recovering a server to the "healthy" state. The default is 3.
    • Status Code: (HTTP only) Required. Specify the status code a healthy backend server must return.
    • Response Body Regex: (HTTP only) Optional. Provide a regular expression for parsing the response body from the backend server. The system treats a blank entry here as the value ".*".
      Note

      Health checks require all fields to match. Your status code and response body both must match, as specified.
  6. Click Save.

Common Side effects of health check misconfiguration

The following are common side effects of health check misconfiguration, and can be used to troubleshoot issues.
  • Wrong Port

    All the backend servers report as unhealthy. If the backend servers do not have any problems, you may have made a mistake setting the port it needs to be a port that matches a port that is listening and has allowed traffic on the backend. OCI Logging Error: errno":"EHOSTUNREACH","syscall":"connect".

  • Wrong Patch

    All the backend servers report as unhealthy. If the backend servers do not have any problems, you may have made a mistake setting the path for the HTTP health check it needs to match a an actual application on the backend. In this scenario you can use a curl test from a system in the same network. ex: $ curl -i http://10.0.0.5/health You would expect to get the configured status code in the response OCI Logging Error: "msg":"invalid statusCode","statusCode":404,"expected":"200".

  • Wrong Protocol

    All the backend servers report as unhealthy. If the backend servers do not have any problems, you may have made a mistake setting the protocol it needs to match the protocol that is listening on the backend. For example: We only support TCP and HTTP health checks. If your backend is using HTTPS then you would need to use TCP as the protocol. OCI Logging Error: "code":"EPROTO","errno":"EPROTO".

  • Wrong Status Code

    All the backend servers report as unhealthy. If the backend servers do not have any problems, for a HTTP health check you may have made a mistake setting the status code to match the actual status code being returned from the backend. A common scenario is when a backend is returning a 302 and you are expecting a 200. This is likely the backend sending you to a login page or another location on the server. In this scenario you can either fix the backend to return the expected code or use 302 in your health check config. OCI Logging Error: "msg":"invalid statusCode","statusCode":XX,"expected":"200" where XX to be the status code that is returned.

  • Wrong Regex Pattern
    All the backend servers report as unhealthy. If the backend servers do not have any problems, you may have made a mistake setting an incorrect regex pattern consistent with the body, or the backend is not returning the expected body. In this scenario you can either change the backend to match the pattern or correct the pattern to match the backend. The following ae some specific pattern examples.
    • Any Content - ".*"
    • A page returning the value "Status:OK:" - "Status:OK:.*"
    • OCI Logging Error: "response match result: failed"
  • Misconfigured Network Security Groups, Security Lists or Local Firewall

All or some of the backend servers report as unhealthy. If the backend servers do not have any problems, you may have made a mistake configuring either the NSGs, Security Lists, or local firewalls such as firewalld, iptables, or SELiinux. In this scenario you can use a curl or netcat test from a system that belongs to the same subnet and NSG as your LBaaS instance HTTP: ex: $ curl -i http://10.0.0.5/health TCP: ex: nc -zvw3 10.0.05 443.

You can check your local firewall by using the following command: firewall-cmd --list-all --zone=public. If your firewall is missing the expected rules you can use a command set like this to add the service: (this example is for HTTP port 80):
  • firewall-cmd --zone=public --add-service=http
  • firewall-cmd --zone=public --permanent --add-service=http

Creating a custom health check page

In many scenarios you may want to expose your own custom health check page to do a more thorough check. One example scenario is to use the flask application as in the example below rather than relying on your existing application.https://pypi.org/project/py-healthcheck/

import tornado.web

from healthcheck import TornadoHandler, HealthCheck, EnvironmentDump

# add your own check function to the healthcheck

def redis_available():

client = _redis_client()

info = client.info()

return True, "Redis Test Pass"

health = HealthCheck(checkers=[redis_available])

app = tornado.web.Application([

("/healthcheck", TornadoHandler, dict(checker=health)),

])

In the above example the test page is doing more than just making sure the HTTP application is listening. This example checks for a redis client and waits for a response to make sure the full application is healthy before returning a 200 OK. Some other command examples would be to check for diskspace or the availability of upstream dependency. In your health check configuration you would specify the following:
  • /healthcheck as your path
  • flask default 5000 as port
  • 200 as status code

Editing Health Check Policies

You create your health check tests when you create a backend set.

To edit an existing health check policy
  1. Open the navigation menu. Under the Core Infrastructure group, go to Networking and click Load Balancers.
  2. Click the name of the Compartment that contains the load balancer you want to modify, and then click the load balancer's name.
  3. In the Resources menu, click Backend Sets, and then click the name of the backend set you want to modify.
  4. Click Update Health Check.
  5. In the Health Check section, specify the test parameters to confirm the health of backend servers.

    Tip

    All parameters are required when updating an existing health check policy.
    • Protocol: Required. Specify the protocol to use, either HTTP or TCP.

      Important

      Configure your health check protocol to match your application or service.
    • Port: Required. Specify the backend server port against which to run the health check.

      Tip

      You can enter the value '0' to have the health check use the backend server's traffic port.
    • URL Path (URI): (HTTP only) Required. Specify a URL endpoint against which to run the health check.
    • Interval in ms: Required. Specify how frequently to run the health check, in milliseconds. Default is 10000 (10 seconds).
    • Timeout in ms: Required. Specify the maximum time in milliseconds to wait for a reply to a health check. A health check is successful only if a reply returns within this timeout period. Default is 3000 (3 seconds).

      Important

      Enter a timeout value that is smaller than the interval value to ensure the health check works correctly.
    • Number of retries: Required. Specify the number of retries to attempt before a backend server is considered "unhealthy." This number also applies when recovering a server to the "healthy" state. Default is 3.
    • Status Code: (HTTP only) Required. Specify the status code a healthy backend server must return.
    • Response Body Regex: (HTTP only) Optional. Provide a regular expression for parsing the response body from the backend server. The system treats a blank entry here as the value ".*".

      Tip

      Health checks require all fields to match. Your status code and response body both must match, as specified.
  6. Click Save.