Editing Health Check Policies
This topic describes how to modify health check policies for a backend set.
Required IAM Policy
To use Oracle Cloud Infrastructure, you must be granted security access in a policy by an administrator. This access is required whether you're using the Console or the REST API with an SDK, CLI, or other tool. If you get a message that you don’t have permission or are unauthorized, verify with your administrator what type of access you have and which compartment to work in.
For administrators: For a typical policy that gives access to load balancers and their components, see Let network admins manage load balancers.
Also, be aware that a policy statement with inspect load-balancers
gives the specified group the ability to see all information about the load balancers. For more information, see Details for Load Balancing.
If you're new to policies, see Getting Started with Policies and Common Policies.
Working with Health Check Policies
A health check is a test to confirm the availability of backend servers. A health check can be a request or a connection attempt. Based on a time interval you specify, the load balancer applies the health check policy to continuously monitor backend servers. If a server fails the health check, the load balancer takes the server temporarily out of rotation. If the server subsequently passes the health check, the load balancer returns it to the rotation.
You configure your health check policy when you create a backend set. You can configure TCP-level or HTTP-level health checks for your backend servers.
- TCP-level health checks attempt to make a TCP connection with the backend servers and validate the response based on the connection status.
- HTTP-level health checks send requests to the backend servers at a specific URI and validate the response based on the status code or entity data (body) returned.
The service provides application-specific health check capabilities to help you increase availability and reduce your application maintenance window.
Configure your health check protocol to match your application or service.
If you run an HTTP service, be sure to configure an HTTP-level health check. If you run a TCP-level health check against an HTTP service, you might not get an accurate response. The TCP handshake can succeed and indicate that the service is up even when the HTTP service is incorrectly configured or having other issues. Although the health check appears good, customers might experience transaction failures. For example:
-
The backend HTTP service has issues when talking to the health check URL and the health check URL returns 5XX messages. An HTTP health check catches the message from the health check URL and marks the service as down. In this case, a TCP health check handshake succeeds and marks the service as healthy, even though the HTTP service is may not be usable.
-
The backend HTTP service responds with 4XX messages because of authorization issues or no configured content. A TCP health check does not catch these errors.
Health Status
The Load Balancing service provides health status indicators that use your health check policies to report on the general health of your load balancers and their components. You can see health status indicators on the Console List and Details pages for load balancers, backend sets, and backend servers. You also can use the Load Balancing API to retrieve this information.
Health status indicators have four levels. The following table provides the general meaning of each level:
Level | Color | Description |
---|---|---|
OK | Green |
No attention required. The resource is functioning as expected. |
Warning | Yellow |
Some reporting entities require attention. The resource is not functioning at peak efficiency or the resource is incomplete and requires further work. |
Critical | Red |
Some or all reporting entities require immediate attention. The resource is not functioning or unexpected failure is imminent. |
Unknown | Gray |
Health status cannot be determined. The resource is not responding or is in transition and might resolve to another status over time. |
The precise meaning of each level differs among the following components:
Using Health Status
At the highest level, load balancer health reflects the health of its components. The health status indicators provide information you might need to drill down and investigate an existing issue. Some common issues that the health status indicators can help you detect and correct include:
A health check is misconfigured.
In this case, all the backend servers for one or more of the affected listeners report as unhealthy. If your investigation finds that the backend servers do not have problems, then a backend set probably includes a misconfigured health check.
A listener is misconfigured.
All the backend server health status indicators report OK, but the load balancer does not pass traffic on a listener.
The listener might be configured to:
- Listen on the wrong port.
- Use the wrong protocol.
- Use the wrong policy.
If your investigation shows that the listener is not at fault, check the security list configuration.
A security rule is misconfigured.
Health status indicators help you diagnose two cases of misconfigured security rules:
- All entity health status indicators report OK, but traffic does not flow (as with misconfigured listeners). If the listener is not at fault, check the security rule configuration.
-
All entity health statuses report as unhealthy. You have checked your health check configuration and your services run properly on your backend servers.
In this case, your security rules might not include the IP range for the source of the health check requests. You can find the health check source IP on the Details page for each backend server. You can also use the API to find the IP in the
sourceIpAddress
field of the HealthCheckResult object.Note
Source IP
The source IP for health check requests comes from a Compute instance managed by the Load Balancing service.
One or more of the backend servers reports as unhealthy.
A backend server might be unhealthy or the health check might be misconfigured. To see the corresponding error code, check the status field on the backend server's Details page. You can also use the API to find the error code in the healthCheckStatus
field of the HealthCheckResult object.
Other cases in which health status might prove helpful include:
- VCN network security groups or security lists block traffic.
- Compute instances have misconfigured route tables.
Health status is updated every three minutes. No finer granularity is available.
Health status does not provide historical health data.
Health Check Best Practices
- The backend HTTP service has issues when communicating with the health check URL and the health check URL returns 5nn messages. An HTTP health check catches the message from the health check URL and marks the service as down. In this case, a TCP health check handshake succeeds and marks the service as healthy, even though the HTTP service may not be usable.
- The backend HTTP service responds with 4nn messages because of authorization issues or no configured content. A TCP health check does not catch these errors.
Configuring the Health Check
- Open the navigation menu, click Networking, and then click Load Balancers.
- Click the name of the compartment that contains the load balancer you want to modify, and then click the name of the load balancer.
- In the Resources menu, click Backend Sets and then click the name of the backend set you want to modify.
- Click Update Health Check.
- In the Health Check section, specify the test parameters to confirm
the health of backend servers.All parameters are required when updating an existing health check policy.
- Protocol: Required. Specify the protocol to use, either HTTP or
TCP.Note
Configure your health check protocol to match your application or service. - Port: Required. Specify the backend server port against which to
run the health check. Note
You can enter the value0
to indicate that the health check use the traffic port of the backend server. - URL Path (URI): (HTTP only) Required. Specify a URL endpoint
against which to run the health check. Example:
(This is a commonly used path for a health check application)./health
- Interval in ms: Required. Specify how frequently to run the health check, in milliseconds. Default is 10000 (10 seconds).
- Timeout in ms: Required. Specify the maximum time in
milliseconds to wait for a reply to a health check. A health
check is successful only if a reply returns within this timeout
period. Default is 3000 (3 seconds). Note
Enter a timeout value that is smaller than the interval value to ensure the health check works correctly. - Number of retries: Required. Specify the number of retries to attempt before a backend server is considered unhealthy. This number also applies when recovering a server to the healthy state. The default is 3.
- Status Code: (HTTP only) Required. Specify the status code a healthy backend server must return.
- Response Body Regex: (HTTP only) Optional. Provide a regular
expression for parsing the response body from the backend
server. The system treats a blank entry here as the value
.*
.Note
Health checks require all fields to match. Your status code and response body both must match, as specified.
- Protocol: Required. Specify the protocol to use, either HTTP or
TCP.
- Click Save.
Common Side effects of health check misconfiguration
- Wrong Port
In this scenario, all of the backend servers report as unhealthy. If the backend servers do not have any problems, you may have made a mistake setting the port it needs to be a port that matches a port that is listening and has allowed traffic on the backend.
OCI Logging Error:
errno:EHOSTUNREACH, syscall:connect
- Wrong Patch
In this scenario, all of the backend servers report as unhealthy. If the backend servers do not have any problems, you may have made a mistake setting the path for the HTTP health check it needs to match a an actual application on the backend. In this scenario you can use the
curl
utility to test from a system in the same network. For example:$ curl -i http://backend_ip_address/health
You will receive the configured status code in the response OCI Logging Error:
"msg":"invalid statusCode","statusCode":404,"expected":"200".
- Wrong Protocol
In this scenario, all of the backend servers report as unhealthy. If the backend servers do not have any problems, you may have made a mistake setting the protocol it needs to match the protocol that is listening on the backend. For example: We only support TCP and HTTP health checks. If your backend is using HTTPS then you would need to use TCP as the protocol.
OCI Logging Error:
code:EPROTO, errno:EPROTO
- Wrong Status Code
In this scenario, all of the backend servers report as unhealthy. If the backend servers do not have any problems, for a HTTP health check you may have made a mistake setting the status code to match the actual status code being returned from the backend. A common scenario is when a backend returns a
302
status code but you are expecting a200
status code. This is likely the backend sending you to a login page or another location on the server. In this scenario you can either fix the backend to return the expected code or use302
in your health check configuration.OCI Logging Error:
msg:invalid statusCode, statusCode:nnn,expected:200
wherennn
to be the status code that is returned. - Wrong Regex PatternAll the backend servers report as unhealthy. If the backend servers do not have any problems, you may have made a mistake setting an incorrect regex pattern consistent with the body, or the backend is not returning the expected body. In this scenario you can either change the backend to match the pattern or correct the pattern to match the backend. The following are some specific pattern examples.
- Any Content -
.*
- A page returning the value
Status:OK:
-Status:OK:.*
- OCI Logging Error:
response match result: failed
- Any Content -
- Misconfigured Network Security Groups, Security Lists or Local Firewall
All or some of the backend servers report as unhealthy. If the backend servers do not
have any problems, then you may have improperly configured either the network
security groups, security lists, or local firewalls (such as
firewalld
, iptables
, or
SELinux
. In this scenario you can use either the
curl
or netcat
utilities to test from a system
that belongs to the same subnet and network security group as your load balancer
instance HTTP. For example: $ curl -i
http://backend_ip_address/health TCP
and
nc -zvw3 backend_ip_address 443
.
firewall-cmd --list-all --zone=public.
. If your firewall is
missing the expected rules, then you can use a command set like this to add the
service (this example is for HTTP port 80): firewall-cmd --zone=public --add-service=http
firewall-cmd --zone=public --permanent --add-service=http
Creating a custom health check page
In many scenarios you may want to expose your own custom health check page to do a more thorough check. One example scenario is to use the flask application, as in the following example, rather than relying on your existing application.https://pypi.org/project/py-healthcheck/
import tornado.web
from healthcheck import TornadoHandler, HealthCheck,
EnvironmentDump
# add your own check function to the healthcheck
def redis_available():
client = _redis_client()
info = client.info()
return True, "Redis Test Pass"
health = HealthCheck(checkers=[redis_available])
app = tornado.web.Application([
("/healthcheck", TornadoHandler, dict(checker=health)),
])
redis
client
and waits for a response to ensure the full application is healthy before returning
a 200
status code. Some other command examples would be to check
for disk space or the availability of an upstream dependency. In your health check
configuration, specify the following:/healthcheck
as your pathflask default 5000
as port200
as status code
Editing Health Check Policies
You create your health check tests when you create a backend set.
- Open the navigation menu, click Networking, and then click Load Balancers.
- Click the name of the Compartment that contains the load balancer you want to modify, and then click the load balancer's name.
- In the Resources menu, click Backend Sets, and then click the name of the backend set you want to modify.
- Click Update Health Check.
-
In the Health Check section, specify the test parameters to confirm the health of backend servers.
Tip
All parameters are required when updating an existing health check policy.-
Protocol: Required. Specify the protocol to use, either HTTP or TCP.
-
Port: Required. Specify the backend server port against which to run the health check.
Tip
You can enter the value '0' to have the health check use the backend server's traffic port. - URL Path (URI): (HTTP only) Required. Specify a URL endpoint against which to run the health check.
- Interval in ms: Required. Specify how frequently to run the health check, in milliseconds. Default is 10000 (10 seconds).
-
Timeout in ms: Required. Specify the maximum time in milliseconds to wait for a reply to a health check. A health check is successful only if a reply returns within this timeout period. Default is 3000 (3 seconds).
Important
Enter a timeout value that is smaller than the interval value to ensure the health check works correctly. - Number of retries: Required. Specify the number of retries to attempt before a backend server is considered "unhealthy." This number also applies when recovering a server to the "healthy" state. Default is 3.
- Status Code: (HTTP only) Required. Specify the status code a healthy backend server must return.
-
Response Body Regex: (HTTP only) Optional. Provide a regular expression for parsing the response body from the backend server. The system treats a blank entry here as the value ".*".
Tip
Health checks require all fields to match. Your status code and response body both must match, as specified.
-
- Click Save.
Using the API
For information about using the API and signing requests, see REST APIs and Security Credentials. For information about SDKs, see Software Development Kits and Command Line Interface.
Use this API operation to edit a backend set's health check policy:
Use these API operations to retrieve health status information: