Load balancers use a health-check mechanism to establish if a service instance is properly working and if it can process requests from clients. If the health-checks succeed, the load balancer includes the service instance in the pool of available instances, and requests are routed to the instance based on the existing scheduling rules. However, if the health-checks fail, the instance is removed from the load balancer's scheduling list.
A health-check is considered failed if the response is different than the one expected, or if no response is received after a specified timeout value. The timeout mast be properly tuned because if it is too short, a sporadically overloaded service that is slow to respond can be considered down. If the timeout is too long, the load balancer will take too much time to detect failures, and users will notice the lack of response.
The simplest health-check is to try to open a TCP connection to the service instance. However, this health-check only proves that the application is listening on the assigned port. It does not show that the instance can process requests. To better establish that the instance is properly working, the health-check must actually exercise the service instance.
The load balancer performs health-checks at a specified interval. The interval needs to be as short as possible so that the load balancer will quickly detect failures. However, too many health-check requests can cause performance degradation. In the worst case, frequent health-checks can overload the service instances.
To determine if a server instance is down, the load balancer monitors the number of consecutive failed health-checks. If this number reaches a specified threshold, an instance is considered down. The time it takes to make this determination equals the number of consecutive failed health-checks, multiplied by the health-check interval. During this time, the load balancer considers a failed instance to be operating correctly, and users will notice a lack of response.
The health-check parameters need to be tuned separately for each service module. The following table specifies health-check parameter values that can be used as a starting point for the reference configuration.
Table 3–5 Specification for Load Balancer Health-Checks
Parameter |
Directory Service |
Access Manager Service |
Portal Service |
Gateway Service |
---|---|---|---|---|
Health-check Type |
LDAP (simple, anonymous bind) |
HTTP |
HTTP |
HTTP |
Query |
DN: <None> Base:dc=pstest,dc=com Scope: Base Query: (objectclass=*) |
GET/amserver/isAlive.jsp |
GET/portal |
GET |
Expected Result |
Any LDAP success code |
HTTP 200 |
HTTP 302 |
HTTP 302 |
Health-check Timeout |
20 seconds |
10 seconds |
5 seconds |
5 seconds |
Interval Between Checks |
60 seconds |
30 seconds |
30 seconds |
30 seconds |
Consecutive Failed Health-check Threshold |
3 |
3 |
3 |
3 |
In the reference configuration, Gateway SSL sessions are terminated at the load balancer, and the Gateway instances run plain HTTP. If the SSL sessions are terminated at the Gateway instances instead of at the Gateway load balancer, then the Health-check needs to be configured to use the SSL protocol.