Troubleshooting Load Balancing
Describes troubleshooting methods for addressing load balancer issues.
Enable Oracle Cloud Infrastructure load balancing access and error logs to troubleshoot common errors. See Logging for Load Balancers for more information.
Troubleshooting an HTTP 502 Bad Gateway Error
Learn how to troubleshoot an HTTP 502 bad gateway error for a Load Balancer resource.
In addition to monitoring and management, load balancing logging helps you to identify,
isolate, and troubleshoot issues with your load balancer infrastructure. The following
procedure illustrates how to troubleshoot a 502 Bad Gateway
error
encountered when deploying a new web application, example.com
. The
example uses an Oracle Cloud Infrastructure
public load balancer as the front end in a development environment. The task fails with
a 502 Bad Gateway
error on the browser. Troubleshoot the issue using
load balancer access and error logs, as follows:
-
Confirm the error using the
curl
utility, as follows:curl -v http://example.com
> GET / HTTP/1.1 > Host: 192.0.2.99 > User-Agent: curl/7.54.0 > Accept: */* > < HTTP/1.1 502 Bad Gateway < Content-Type: text/html < Content-Length: 161 < Connection: keep-alive
-
Search the load balancer access and error logs for "
lbStatusCode
" and "backendStatusCode
."-
If the results include
backendStatusCode: 502
, then:Possible causes:
-
Issue is an improperly configured backend.
-
Backend is likely another reverse proxy or LB.
Possible resolutions:
-
Examine upstream proxy logs to determine why the
502
error is being returned. -
Resolve any issues on the ultimate backend that is causing the upstream proxy to return a
502
error.
-
-
If the results include
backendStatusCode: 504
, then:Possible causes:
-
When a
504
error occurs from the backend, it typically indicates that the backend is another proxy or load balancer service instance. The error typically occurs when a proxy is unable to connect to an upstream server in a specified amount of time. -
Examine the logs of the upstream system to determine what is causing the upstream proxy from connecting to the backend.
Possible resolutions:
-
Increase the amount of time for the connection timeout.
-
Determine why the backend is taking longer to respond than usual using a utility, such as
tcpdump
, and built-in application tools.
-
-
If the results include
backendStatusCode: 500
, then:Possible causes:-
When a
500
error occurs from the backend, it typically indicates a server-side error, commonly known as an "Internal Server Error." Backend applications typically cause this error. -
Inability to connect to upstream resources, such as databases, APIs, and services.
Possible resolutions:
Resolve application-level issue that is causing the error.
-
-
If the results include
backendStatusCode:
with no error code, then:-
Typically, when no backend status code accompanies
lbStatusCode: 502
, no backend is available to send the connections. -
You might also notice a
No healthy backends available in associated backendSet
message in the load balancer error Logs. -
Ensure that the backends are healthy. If the backends are healthy, then confirm that the health check is properly configured.
-
-
Debugging a Backend Server Timeout
Learn how to debug a timeout error associated with a backend server used by a Load Balancer resource.
When the backend server exceeds the response time when responding to a request, a 504
error occurs indicating that the backend server is either down or not responding to the
request forwarded by the load balancer. The client application receives the following
response code: HTTP/1.1 504 Gateway Timeout
.
Errors can occur for the following reasons:
-
The load balancer failed to establish a connection to the backend server before the connection timeout expired.
-
The load balancer established a connection to the backend server but the backend did not respond before the idle timeout period elapsed.
-
The security lists or network security groups for the subnet or the VNIC did not allow traffic from the backends to the load balancer.
-
The backend server or application server failed.
Follow these steps to troubleshoot the backend server timeout errors:
-
Use the
curl
utility to directly test the backend server from a host in the same network.curl -i http://backend_ip_address
If this test takes longer than one second to respond, an application-level issue is causing latency. Oracle recommends that you check any upstream dependencies that might cause latency, including:-
Network attached storage such as iSCSI or NFS
-
Database latency
-
An off-premise API
-
An application tier
-
-
Check the application by accessing it directly from the backend server. Check its access logs to determine if the application can be accessed and is functioning properly.
-
If the load balancer and the backend server are in different subnets, then check whether the security lists contain rules to allow traffic. If no rules exist, then traffic is not allowed.
-
Enter the following commands to determine whether firewall rules exist on the backend servers that block traffic:
iptables -L
lists all firewall rules enforced byiptables
sudo firewall-cmd --list-all
lists all firewall rules enforced byfirewalld
-
Enable logging on the load balancer to determine whether the load balancer or the backend server is causing the latency.
Testing TCP and HTTP Backend Servers
Learn how to test the TCP and HTTP backend servers used by a Load Balancer resource.
This topic describes how to troubleshoot a load balancer connection. The topology used in this procedure has a public load balancer in a public subnet and the backends are in the same subnet.
Oracle recommends that you use the Oracle Cloud Infrastructure Logging service to troubleshoot issues. (See Details for Load Balancer Logs.)
In addition to using Oracle Cloud Infrastructure logging, however, you can use other utilities listed in this section to troubleshoot the traffic that is processed by the load balancer and sent to a backend. To perform these tests, Oracle recommends that you create an instance in the same network as your load balancer and allow the traffic in the same network security groups and security lists. Use the following tools to troubleshoot:
-
ping
Before using the more advanced utilities listed here, Oracle recommends that you perform a basicping
test. For this test to succeed, you must allow ICMP traffic between the test instance and the backend.$ ping backend_ip_address
The response should look similar to:PING 192.0.2.2 (192.0.2.2) 56(84) bytes of data. 64 bytes from 192.0.2.2: icmp_seq=1 ttl=64 time=0.028 ms 64 bytes from 192.0.2.2: icmp_seq=2 ttl=64 time=0.044 ms
If you receive a message that contains "64 bytes from...", then the ping succeeded.
Receiving a message that contains "Destination Host Unreachable" indicates that the system does not exist.
Receiving no message indicates that the system exists but the ICMP protocol is not allowed. Check all firewalls, security lists, and network security groups to ensure ICMP is allowed.
-
curl
Use the
curl
utility to send HTTP requests to a specific host, port, or URL.-
The following example shows using
curl
to connect to a backend that is sending a403 Forbidden
error:$ curl -I http://backend_ip_address/health HTTP/1.1 403 Forbidden Date: Tue, 17 Mar 2021 17:47:10 GMT Content-Type: text/html; charset=UTF-8 Content-Length: 3539 Connection: keep-alive Last-Modified: Tue, 10 Mar 2021 20:33:28 GMT ETag: "dd3-5b3c6975e7600" Accept-Ranges: bytes
In the preceding example, the health check fails, returning a
403
error, indicating that the backend does not have local file permissions configured properly for the health check page. -
The following example shows using
curl
to connect to a backend that is sending a404 Not Found
error:$ curl -I http://backend_ip_address/health HTTP/1.1 404 Not Found Date: Tue, 17 Mar 2021 17:47:10 GMT Content-Type: text/html; charset=UTF-8 Content-Length: 3539 Connection: keep-alive Last-Modified: Tue, 10 Mar 2021 20:33:28 GMT ETag: "dd3-5b3c6975e7600" Accept-Ranges: bytes
In the preceding example, the health check fails, returning a
404
error, indicating that the health check page does not exist in the expected location. -
The following example shows a backend that exists and either a network security group, the security lists, or a local firewall is blocking the traffic:
$ curl -I backend_ip_address curl: (7) Failed connect to backend_ip_address:port; Connection refused
-
The following example shows a backend that does not exist:
$ curl -I backend_ip_address curl: (7) Failed connect to backend_ip_address:port; No route to host
-
-
Netcat
Netcat is a networking utility for reading from and writing to network connections using TCP or UDP.
-
The following example shows using the
netcat
utility at the TCP level to ensure that the destination backend server can receive a connection:$ nc -vz backend_ip_address port Ncat: Connected to backend_ip_address:port.
In the preceding example,
port
is open for connections. -
$ nc -vn backend_ip_address port Ncat: Connection timed out.
In the preceding example,
port
is closed.
-
-
Tcpdump
Use the
tcpdump
utility to capture all traffic to a backend to ensure which traffic is coming from a load balancer and what is being returned to the load balancer.sudo tcpdump -i any -A port port src load_balancer_ip_address 11:25:54.799014 IP 192.0.2.224.39224 > 192.0.2.224.80: Flags [P.], seq 1458768667:1458770008, ack 2440130792, win 704, options [nop,nop,TS val 461552632 ecr 208900561], length 1341: HTTP: POST /health HTTP/1.1
-
OpenSSL
When troubleshooting SSL issues between the load balancer instance and the backend servers, Oracle recommends using the
openssl
utility. This utility opens an SSL connection to a specific host name and port, and prints the SSL certificate and other parameters.Other options for troubleshooting issues are:-
-showcerts
This option prints all certificates in the certificate chain presented by the backend server. Use this option to identify issues, such as a missing intermediate certificate authority certificate.
-
-cipher cipher_name
This option forces the client and server use a specific cipher suite and helps to rule out whether the backend is allowing specific ciphers.
-
-
Netstat
Use the
netstat -natp
command to ensure that the application running on the backend server is up and running. For TCP or HTTP traffic, the backend application, IP address, and port must all be in listen mode. If the application port on the backend server is not in listen mode, then the TCP port of the application is not up.To resolve this issue, ensure that the application is up and running by either restarting the application or the backend server.
Common Load Balancer Errors
Learn about common load balancer errors associated with the Load Balancing service.
Common load balancer errors include, series 500 and series 400 errors, health check errors, client errors, and SSL errors. The subsequent topics in this section describe these common errors and detail troubleshooting procedures for each, where applicable.
Server Errors (500-599)
Learn about common load balancer server errors (500-599) associated with the Load Balancing service.
Error messages:
-
lbStatusCode: "504"
-
backendStatusCode: ""
Oracle Cloud Infrastructure log category: Access log
Symptoms:
The client fails with a 504
error.
Possible causes:
The load balancer is not able to establish connections with any of the backends, even though the health check is marking the backends as available.
Possible solutions:
Configure the health check correctly.
Troubleshooting documentation: Editing a Load Balancer's Health Check Policies
-
lbStatusCode: "502"
-
backendStatusCode: "502"
Oracle Cloud Infrastructure log category: Access log and error log
-
The client fails with a
502 Bad Gateway
error. -
The backend health check succeeds.
-
The backend returns a
502
error.
-
An application on the backend is returning a
502
error. -
The backend is configured incorrectly.
-
The backend is likely another reverse proxy or load balancer.
Possible solutions:
Examine the backend application logs to determine why a 502
error is
returned.
Troubleshooting documentation: Troubleshooting an HTTP 502 Bad Gateway Error and Testing TCP and HTTP Backend Servers.
-
lbStatusCode: "502"
-
backendStatusCode: ""
-
No healthy backends available in associated backend set
Oracle Cloud Infrastructure log category: Access log and error log
-
The client fails with a
502 Bad Gateway
error. -
The backend health check fails.
-
No traffic observed to a specific backend or all backends.
-
A backend application is not responding to the health check with the expected response.
-
If no error occurs from the backend, then a TCP health check is configured.
-
A single backend or all backends are configured in drain mode.
-
Determine why TCP health check is failing.
-
Convert to HTTP health check.
-
Change the drain mode to false (undrain) for a given backend or all backends.
Troubleshooting documentation: Troubleshooting an HTTP 502 Bad Gateway Error and Testing TCP and HTTP Backend Servers.
Error:
Backend ip_address
abruptly closes
connection.
Oracle Cloud Infrastructure log category: Error log
-
The client fails with a
502 Bad Gateway
error. -
The client reports IO error in load balancer metrics.
Possible causes
The backend connection timeout is configured incorrectly, with a lower timeout value than the load balancer.
-
Determine why backend application is timing out.
-
If the backend timeout value needs to be adjusted, then adjust it to be greater than the load balancer timeout value.
Troubleshooting documentation: Testing TCP and HTTP Backend Servers
Persistence selected backend ip_address which failed and no_fallback is selected
Oracle Cloud Infrastructure log category: Error log
-
The client fails with a
502 Bad Gateway
error. -
Session persistence is failing.
-
Backend set is configured with session persistence and the expected backend is not available because the connection failed or timed out.
-
Fallback option is disabled.
-
Determine why backend application is not reachable.
-
Enable fallback option in case the selected server is unavailable.
Troubleshooting documentation: Fallback
For all other 5nn errors, the most likely causes are issues with the backend server.
Client Errors (400-499)
Learn about common load balancer client errors (400-499) associated with the Load Balancing service.
-
lbStatusCode: "400"
-
backendStatusCode: ""
-
400 bad request header or cookie too large
Oracle Cloud Infrastructure log category: Access log
-
The load balancer returns a status code
400
. -
The backend does not return a status code.
Possible causes:
The client is sending a request that exceeds the configured buffer size.
Possible solutions:
Increase the HTTP request header size on the load balancer. By default, the size limit is 8 KB but raising it to 64 KB resolves the issue.
Troubleshooting documentation: HTTP Header Rules
-
lbStatusCode: "404"
-
backendStatusCode: "404"
Oracle Cloud Infrastructure log category: Access log
-
The load balancer returns a
404
status code. -
The backend returns a
404
status code.
Possible causes:
The expected page does not exist on the backend.
-
Create the missing page.
-
Configure the client to call the correct page.
-
lbStatusCode: "403"
-
backendStatusCode: "403"
Oracle Cloud Infrastructure log category: Access log
-
The load balancer returns a
403
status code. -
The backend returns a
403
status code.
-
Expected page does not have sufficient permission on the backend.
-
Expected authentication token is missing or not being forwarded.
Possible solutions:
-
Create missing permissions on backend.
-
Adjust client configuration to ensure that tokens are sent properly.
-
Ensure that all tokens being sent are arriving at the backend.
-
If the header is missing:
-
Adjust header size on the load balancer or the client.
-
Allow headers with special characters.
-
Troubleshooting documentation: HTTP Header Rules
Health Check Errors
Learn about health check errors associated with Load Balancer resources.
No healthy backends available in associated backendSet
Oracle Cloud Infrastructure log category: Error log
Symptoms:
The client fails with a 502 Bad Gateway
error.
-
No backends in the backend set.
-
No backends responding to health check.
-
Determine why backends are not responding to health check.
-
Check and adjust any health check settings, including status code, regular expressions, interval timeout, port, and protocol.
Troubleshooting documentation: Editing a Load Balancer's Health Check Policies
Backend health status failure reason: Status code Mismatch
Oracle Cloud Infrastructure category: Backend Health Status
"msg":"invalid statusCode","statusCode":XXX,"expected":"200"
Oracle Cloud Infrastructure log category: Error log
-
The backend fails the health check.
-
The client fails with a
502 Bad Gateway
error. -
invalid statusCode
appears in the error logs.
-
The backend is responding with an incorrect response code.
-
The backend health check fails because of response code mismatch.
-
The health check failures are because of an unexpected status code in the regular expression body.
-
Determine why the backend is sending the incorrect response code.
-
Adjust the path or status code of the health check to match the backend.
Troubleshooting documentation: Editing a Load Balancer's Health Check Policies
Backend Health Status Failure Reason: Regular expression mismatch
Oracle Cloud Infrastructure category: Backend Health Status
"response match result: failed"
Oracle Cloud Infrastructure log category: Error log
-
The backend fails the health check.
-
The client fails with a
502 Bad Gateway
error. -
"response match result: failed"
appears in the error logs.
Possible causes:
The backend health check fails because of regular expression mismatch, incorrect value returned, or incorrect value provided to the health check.
-
Determine why the backend is sending the incorrect body.
-
Adjust the path or regular expression pattern of the health check to match the backend.
Troubleshooting documentation: Editing a Load Balancer's Health Check Policies
Backend Health Status Failure Reason: Connection failed
Oracle Cloud Infrastructure category: Backend Health Status
"errno":"EHOSTUNREACH","syscall":"connect"
"ECONNREFUSED","errno":"ECONNREFUSED"
Oracle Cloud Infrastructure log category: Error log
-
The backend fails the health check.
-
The client fails with a
502 Bad Gateway
error. -
"EHOSTUNREACH"
appears in error logs.
-
The backend health check fails because of an unreachable host.
-
The backend health check fails because of a connection reset.
-
An application or firewall is actively refusing the connection.
-
Check the local instance firewall to confirm that traffic is being allowed.
-
Check the local instance to confirm that the application is running.
-
Check the network security group and security lists to confirm that traffic is allowed.
Troubleshooting documentation: Access and Security
"healthStatus":"Unhealthy to Healthy"
"healthStatus":"Healthy to Unhealthy"
Oracle Cloud Infrastructure log category: Error log
-
The client behaves as expected but fails periodically.
-
The backend switches between passing and failing the health check.
-
"Unhealthy to Healthy"
or"Healthy to Unhealthy"
appears in error logs.
-
An unhealthy backend becomes healthy.
-
If the health status of the backend changes often, it can indicate a chronic problem.
-
Ensure that the instance is not changing health status abnormally.
-
Check application logs on the backend server for any application-specific issues.
Backend Health Status Failure Reason: Timed out
Oracle Cloud Infrastructure category: Backend Health Status
"msg":"connect timed out","elapsed":3000}
Oracle Cloud Infrastructure log category: Error log
-
The client fails with a
502 Bad Gateway
error. -
The backend is periodically or chronically failing health checks.
-
"connect timed out"
appears in the error logs.
-
The backend server is not responding to health checks in the expected time period.
-
Slow upstream dependency including, database, application service or API, or slow storage services, such as Oracle Cloud Infrastructure File Storage service, Elastic Block Store, or Object Storage.
-
Perform a local test to the backend to eliminate the load balancer as a cause.
-
Check the performance of all upstream dependencies.
-
Check application logs on the backend server for any dependencies reporting any sort of timeout.
Troubleshooting documentation: Testing TCP and HTTP Backend Servers.
SSL Errors
Learn about single socket layer (SSL) errors associated with Load Balancer resources.
Not all SSL virtual listeners on port 443 have the same set of SSL protocols defined
Symptoms:
You cannot create backends for an existing load balancer nor can you add new servers to the backend created previously within the same load balancer.
Possible causes:
Mismatch of transport layer security (TLS) versions.
Possible solutions:
Match TLS versions on the listeners.
Troubleshooting documentation: SSL Certificate for Load Balancers
(SSL: error:140770FC:SSL routines:SSL23_GET_SERVER_HELLO:unknown protocol) while SSL handshake error
Oracle Cloud Infrastructure log category: Client log
Symptoms:
The client experiences SSL handshake failures in Oracle Cloud Infrastructure metrics (see Load Balancing Metrics).
Possible causes:
The backend is not configured to accept SSL.
-
Confirm that the backend certificate matches the certificate authority that is provided.
-
Ensure that all certificates in the chain are provided in the correct order in the Certificate field.
-
Ensure that you provide the correct certificate depth.
Troubleshooting documentation: SSL Certificate for Load Balancers
Peer backend_ip_address closed connection in SSL handshake
Oracle Cloud Infrastructure log category: Error log
-
The client fails with a
502 Bad Gateway
error. -
The client experiences SSL handshake failures in Oracle Cloud Infrastructure metrics (see Load Balancing Metrics).
-
The backend is not configured to accept SSL.
-
The backend certificate is invalid.
-
Confirm that the backend certificate matches the certificate authority that is provided.
-
Ensure that all certificates in the chain are provided in the correct order in the Certificate field.
-
Ensure that you provide the correct certificate depth.
Troubleshooting documentation: SSL Certificate for Load Balancers
Error:
Client backend_ip_address
has SSL certificate
verify error.
Oracle Cloud Infrastructure log category: Error log
Symptoms:
The client experiences SSL handshake failures in Oracle Cloud Infrastructure metrics (see Load Balancing Metrics).
-
The client certificate is invalid.
-
The client certificate is not trusted.
-
Invalid peer certification verify depth.
-
Ensure that the client certificate is valid.
-
Remove Peer Cert Verify feature on the listener.
Troubleshooting documentation: Key Pair Mismatch and Private Key Consistency.
Client backend_ip_address sent no required SSL certificate
Oracle Cloud Infrastructure log category: Error log
-
The client experiences a
400 Response
error. -
no required SSL certificate
appears in error logs.
Possible causes:
The client is not sending a client certificate.
-
Update the client to send the correct client certificate.
-
Remove Peer Cert Verify feature on the listener.
-
Adjust the certificate verification depth.
Troubleshooting documentation: Configuring Peer Certificate Verification.
"code":"EPROTO","errno":"EPROTO"
Oracle Cloud Infrastructure log category: Error log
Symptoms:
The backend health check fails because of the SSL error.
Possible causes:
The backend is configured to accept SSL but the health check protocol selected does not match that of the backend.
Possible solutions:
Confirm that you are using non-TLS health check on a backend that has TLS enabled.
Troubleshooting documentation: Editing a Load Balancer's Health Check Policies
SSL host name verification failed for host_name
Oracle Cloud Infrastructure log category: Error log
-
The client fails with a
502 Bad Gateway
error. -
Error message contains
SSL host name verification failed
.
Possible causes:
Host name provided does not match what is expected.
-
Configure client to use the expected host name.
-
Configure certificate to match the host name sent by the client.
Troubleshooting documentation: SSL Certificate for Load Balancers
Client-Side Errors
Learn about client-side errors associated with Load Balancer resources.
Error:
Access for client_ip_address
denied by HTTP ACL
rule.
Oracle Cloud Infrastructure log category: Error log
-
The client fails with a
502 Bad Gateway
error. -
The backend does not pass health check.
-
forbidden by HTTP ACL rule
appears in the error log.
Possible causes:
Access control rule set is enabled bud does not include the source IP address.
Possible solutions:
Check and apply respective rule set to include the source IP address.
Troubleshooting documentation: Access Control Rules
Error:
Client client_name
timed out
Oracle Cloud Infrastructure log category: Error log
-
The client fails with a
502 Bad Gateway
error. -
The client experiences SSL handshake failures in Oracle Cloud Infrastructure metrics (see Load Balancing Metrics).
Possible causes:
The client terminated the connection sooner than the configured timeout for the load balancer.
-
Configure client timeout to match expected application configuration.
-
Determine why the backend did not respond in the configured amount of time.
Troubleshooting documentation: Testing TCP and HTTP Backend Servers.