Health Check Policies for Network Load Balancers

Set up and use health checks to determine the availability of backend servers for a network load balancer.

A health check is a test to confirm the availability of backend servers. A health check can be a request or a connection attempt. Based on a time interval you specify, the network load balancer applies the health check policy to continuously monitor backend servers. If a server fails the health check, the network load balancer takes the server temporarily out of rotation. If the server later passes the health check, the network load balancer returns it to the rotation.

You can perform the following health check policy management tasks:

You configure your health check policy when you create a backend set for the following protocols:

  • TCP-level health checks attempt to make a TCP connection with the backend servers and validate the response based on the connection status.

    If it is not practical to create a request for the protocol you are working with, you can leave out the request data. In this case, the backend is considered healthy if the TCP connection succeeds.

  • HTTP-level health checks send requests to the backend servers at a specific URI and validate the response based on the status code or entity data (body) returned.

  • HTTPS-level health checks send requests to the backend servers at a specific URI and validate the response based on the status code or entity data (body) returned over a secure and encrypted HTTPS protocol.

  • UDP-level health checks send a single request to the backend server and match the response (if received) against the response data you specify.

The service provides application-specific health check capabilities to help you increase availability and reduce your application maintenance window.

Configuring Your Health Check Protocol to Match Your Application or Service

If you run an HTTP service, be sure to configure an HTTP-level health check. If you run a TCP-level health check against an HTTP service, you might not get an accurate response. The TCP handshake can succeed and indicate that the service is up even when the HTTP service is incorrectly configured or having other issues. Although the health check appears good, customers might experience transaction failures. For example:

  • The backend HTTP service has issues when talking to the health check URL and the health check URL returns 5XX messages. An HTTP health check catches the message from the health check URL and marks the service as down. In this case, a TCP health check handshake succeeds and marks the service as healthy, even though the HTTP service might not be usable.

  • The backend HTTP service responds with 4XX messages because of authorization issues or no configured content. A TCP health check does not catch these errors.

Heath Status Indicators

The Network Load Balancer service provides health status indicators that use your health check policies to report on the general health of your load balancers and their components. You can see health status indicators on the Console List and Details pages for load balancers, backend sets, and backend servers. You also can use the API to retrieve this information.

Health status indicators have four levels. The following table provides the general meaning of each level:

Level

Color

Description

OK

Green

No attention required.

The resource is functioning as expected.

Warning

Yellow

Some reporting entities require attention.

The resource is not functioning at peak efficiency or the resource is incomplete and requires further work.

Critical

Red

Some or all reporting entities require immediate attention.

The resource is not functioning or unexpected failure is imminent.

Unknown

Gray

Health status cannot be determined.

The resource is not responding or is in transition and might resolve to another status over time.

The precise meaning of each level differs among the following components:

Using Health Status

At the highest level, load balancer health reflects the health of its components. The health status indicators provide information you might need to drill down and investigate an existing issue. Some common issues that the health status indicators can help you detect and correct include:

A health check is misconfigured.

In this case, all the backend servers for one or more of the affected listeners report as unhealthy. If your investigation finds that the backend servers do not have problems, then a backend set probably includes a misconfigured health check.

A listener is misconfigured.

All the backend server health status indicators report OK, but the load balancer does not pass traffic on a listener.

The listener might be configured to:

  • Listen on the wrong port.

  • Use the wrong protocol.

  • Use the wrong policy.

If your investigation shows that the listener is not at fault, check the security list configuration.

A security rule is misconfigured.

Health status indicators help you diagnose two cases of misconfigured security rules:

  • All entity health status indicators report OK, but traffic does not flow (as with misconfigured listeners). If the listener is not at fault, check the security rule configuration.

  • All entity health statuses report as unhealthy. You have checked your health check configuration and your services run properly on your backend servers.

    In this case, your security rules might not include the IP range for the source of the health check requests. You can find the health check source IP on the Details page for each backend server. You can also use the API to find the IP in the sourceIpAddress field of the HealthCheckResult object.

    Note

    Source IP

    The source IP for health check requests comes from a compute instance managed by the Load Balancer service.

One or more of the backend servers reports as unhealthy.

A backend server might be unhealthy or the health check might be misconfigured. To see the corresponding error code, check the status field on the backend server's Details page. You can also use the API to find the error code in the healthCheckStatus field of the HealthCheckResult object.

Other cases in which health status might prove helpful include:

Health status is updated every three minutes. No finer granularity is available.

Health status does not provide historical health data.

Health Check Best Practices

Configure your health check protocol to match your application or service. If you run an HTTP service, be sure to configure an HTTP-level health check. If you run a TCP-level health check against an HTTP service, you might not get an accurate response. The TCP handshake can succeed and indicate that the service is up even when the HTTP service is incorrectly configured or having other issues. Although the health check appears good, customers might experience transaction failures.

For example:

  • The backend HTTP service has issues when talking to the health check URL and the health check URL returns 5XX messages. An HTTP health check catches the message from the health check URL and marks the service as down. In this case, a TCP health check handshake succeeds and marks the service as healthy, even though the HTTP service might not be usable.

  • The backend HTTP service responds with 4XX messages because of authorization issues or no configured content. A TCP health check does not catch these errors.

Configure the Health Check

To configure the health check, use the following procedure:

  1. Open the navigation menu, click Networking, and then click Load balancers. Click Network load balancer. The Network load balancers page appears.

  2. Select the Compartment from the list.

    All load balancers and network load balancers in that compartment are listed in tabular form.

  3. (optional) Select a State from the list to limit the load balancers displayed to that state.

  4. (optional) Uncheck Load balancer under Type to only display network load balancers.

  5. Select the network load balancer containing the backend set that you want to edit. Its Details page appears.

  6. Click Backend sets under Resources.

    The Backend sets list appears. All backend sets are listed in tabular form.

  7. Click the backend set that you want to edit. Its Details dialog box appears.

  8. Click Update health check.

    The Health check section appears.

  9. Specify the test parameters to confirm the health of backend servers.

    All parameters are required when updating an existing health check policy.

    • Protocol: Specify the protocol to use, either HTTP or TCP.

      Note

      Configure your health check protocol to match your application or service.

    • Port: Specify the backend server port against which to run the health check.

      Note

      You can enter the value '0' to have the health check use the backend server's traffic port.

    • URL path (URI): (HTTP only) Specify a URL endpoint against which to run the health check.

      For example:

      /health 

      (This value is a commonly used path for a health check application).

    • Interval in MS: Specify how frequently to run the health check, in milliseconds. Default is 10000 (10 seconds).

    • Timeout in MS: Specify the maximum time in milliseconds to wait for a reply to a health check. A health check is successful only if a reply returns within this timeout period. Default is 3000 (3 seconds).

      Note

      Enter a timeout value that is smaller than the interval value to ensure the health check works correctly.

    • Number of retries: Specify the number of retries to attempt before a backend server is considered "unhealthy." This number also applies when recovering a server to the "healthy" state. The default is 3.

    • Status code: (HTTP only) Required. Specify the status code a healthy backend server must return.

    • Response body regex: (HTTP only) Optional. Provide a regular expression for parsing the response body from the backend server. The system treats a blank entry here as the value ".*".

      Note

      Health checks require all fields to match. Your status code and response body both must match, as specified.

  10. Click Save.

Common Side Effects of Health Check Misconfiguration

The following are common side effects of health check misconfiguration, and can be used to troubleshoot issues.

  • Wrong port

    In this scenario, all the backend servers report as unhealthy. If the backend servers do not have any problems, you might have made a mistake setting the port. Your port must be listening and has allowed traffic on the backend.

    OCI Logging Error: errno":"EHOSTUNREACH","syscall":"connect".

  • Wrong patch

    In this scenario, this scenario, all the backend servers report as unhealthy. If the backend servers do not have any problems, you might have made a mistake setting the path for the HTTP health check it needs to match an actual application on the backend. In you can use a curl test from a system in the same network.

    For example:

    $ curl -i http://10.0.0.5/health

    You receive the configured status code in the response OCI Logging Error:

    "msg":"invalid statusCode","statusCode":404,"expected":"200".
  • Wrong protocol

    In this scenario, all the backend servers report as unhealthy. If the backend servers do not have any problems, you might have made a mistake setting the protocol it needs to match the protocol that is listening on the backend. For example: We only support TCP and HTTP health checks. If your backend is using HTTPS then you would need to use TCP as the protocol.

    OCI Logging Error:

    "code":"EPROTO","errno":"EPROTO".
  • Wrong status code

    In this scenario, all the backend servers report as unhealthy. If the backend servers do not have any problems, for an HTTP health check you might have made a mistake setting the status code to match the actual status code being returned from the backend. A common scenario is when a backend is returning a 302 and you are expecting a 200. This result is likely the backend sending you to a login page or another location on the server. In this scenario, you can either fix the backend to return the expected code or use 302 in your health check config.

    OCI Logging Error:

    "msg":"invalid statusCode","statusCode":XX,"expected":"200" 

    where XX to be the status code that is returned.

  • Wrong regex pattern

    All the backend servers report as unhealthy. If the backend servers do not have any problems, you might have made a mistake setting an incorrect regex pattern consistent with the body, or the backend is not returning the expected body. In this scenario, you can either change the backend to match the pattern or correct the pattern to match the backend. The following are some specific pattern examples.

    • Any Content - ".*"

    • A page returning the value "Status:OK:" - "Status:OK:.*"

    • OCI Logging Error: "response match result: failed"

  • Misconfigured Network Security Groups, security lists, or local firewall

All or some of the backend servers report as unhealthy. If the backend servers do not have any problems, you might have made a mistake configuring either the NSGs, Security Lists, or local firewalls such as firewalld, iptables, or SELiinux. In this scenario you can use a curl or netcat test from a system that belongs to the same subnet and NSG as your balancer instance HTTP:

For example:

$ curl -i http://10.0.0.5/health TCP: ex: nc -zvw3 10.0.05 443.

You can check your local firewall by using the following command:

firewall-cmd --list-all --zone=public.

If your firewall is missing the expected rules you can use a command set like this to add the service: (this example is for HTTP port 80):

  • firewall-cmd --zone=public --add-service=http
  • firewall-cmd --zone=public --permanent --add-service=http

Configuring Your Health Check Protocol to Match Your Application or Service

The service provides application-specific health check capabilities to help you increase availability and reduce your application maintenance window.

If you run an HTTP service, be sure to configure an HTTP-level health check. If you run a TCP-level health check against an HTTP service, you might not get an accurate response. The TCP handshake can succeed and indicate that the service is up even when the HTTP service is incorrectly configured or having other issues. Although the health check appears good, customers might experience transaction failures. For example:

  • The backend HTTP service has issues when talking to the health check URL and the health check URL returns 5XX messages. An HTTP health check catches the message from the health check URL and marks the service as down. In this case, a TCP health check handshake succeeds and marks the service as healthy, even though the HTTP service might not be usable.

  • The backend HTTP service responds with 4XX messages because of authorization issues or no configured content. A TCP health check does not catch these errors.