Troubleshooting Load Balancing

Describes troubleshooting methods for addressing load balancer issues.

Enable Oracle Cloud Infrastructure load balancing access and error logs to troubleshoot common errors. See Logging for Load Balancers for more information.

Troubleshooting an HTTP 502 Bad Gateway Error

Learn how to troubleshoot an HTTP 502 bad gateway error for a Load Balancer resource.

In addition to monitoring and management, load balancing logging helps you to identify, isolate, and troubleshoot issues with your load balancer infrastructure. The following procedure illustrates how to troubleshoot a 502 Bad Gateway error encountered when deploying a new web application, example.com. The example uses an Oracle Cloud Infrastructure public load balancer as the front end in a development environment. The task fails with a 502 Bad Gateway error on the browser. Troubleshoot the issue using load balancer access and error logs, as follows:

  1. Confirm the error using the curl utility, as follows:

    curl -v http://example.com
    > GET / HTTP/1.1
    > Host: 192.0.2.99
    > User-Agent: curl/7.54.0
    > Accept: */*
    >
    < HTTP/1.1 502 Bad Gateway
    < Content-Type: text/html
    < Content-Length: 161
    < Connection: keep-alive
  2. Search the load balancer access and error logs for "lbStatusCode" and "backendStatusCode."

    • If the results include backendStatusCode: 502, then:

      Possible causes:

      • Issue is an improperly configured backend.

      • Backend is likely another reverse proxy or LB.

      Possible resolutions:

      • Examine upstream proxy logs to determine why the 502 error is being returned.

      • Resolve any issues on the ultimate backend that is causing the upstream proxy to return a 502 error.

    • If the results include backendStatusCode: 504, then:

      Possible causes:

      • When a 504 error occurs from the backend, it typically indicates that the backend is another proxy or load balancer service instance. The error typically occurs when a proxy is unable to connect to an upstream server in a specified amount of time.

      • Examine the logs of the upstream system to determine what is causing the upstream proxy from connecting to the backend.

      Possible resolutions:

      • Increase the amount of time for the connection timeout.

      • Determine why the backend is taking longer to respond than usual using a utility, such as tcpdump, and built-in application tools.

    • If the results include backendStatusCode: 500, then:

      Possible causes:
      • When a 500 error occurs from the backend, it typically indicates a server-side error, commonly known as an "Internal Server Error." Backend applications typically cause this error.

      • Inability to connect to upstream resources, such as databases, APIs, and services.

      Possible resolutions:

      Resolve application-level issue that is causing the error.

    • If the results include backendStatusCode: with no error code, then:

      • Typically, when no backend status code accompanies lbStatusCode: 502, no backend is available to send the connections.

      • You might also notice a No healthy backends available in associated backendSet message in the load balancer error Logs.

      • Ensure that the backends are healthy. If the backends are healthy, then confirm that the health check is properly configured.

Debugging a Backend Server Timeout

Learn how to debug a timeout error associated with a backend server used by a Load Balancer resource.

When the backend server exceeds the response time when responding to a request, a 504 error occurs indicating that the backend server is either down or not responding to the request forwarded by the load balancer. The client application receives the following response code: HTTP/1.1 504 Gateway Timeout.

Errors can occur for the following reasons:

  • The load balancer failed to establish a connection to the backend server before the connection timeout expired.

  • The load balancer established a connection to the backend server but the backend did not respond before the idle timeout period elapsed.

  • The security lists or network security groups for the subnet or the VNIC did not allow traffic from the backends to the load balancer.

  • The backend server or application server failed.

Follow these steps to troubleshoot the backend server timeout errors:

  1. Use the curl utility to directly test the backend server from a host in the same network.

    curl -i http://backend_ip_address
    If this test takes longer than one second to respond, an application-level issue is causing latency. Oracle recommends that you check any upstream dependencies that might cause latency, including:
    • Network attached storage such as iSCSI or NFS

    • Database latency

    • An off-premise API

    • An application tier

  2. Check the application by accessing it directly from the backend server. Check its access logs to determine if the application can be accessed and is functioning properly.

  3. If the load balancer and the backend server are in different subnets, then check whether the security lists contain rules to allow traffic. If no rules exist, then traffic is not allowed.

  4. Enter the following commands to determine whether firewall rules exist on the backend servers that block traffic:

    iptables -L lists all firewall rules enforced by iptables

    sudo firewall-cmd --list-all lists all firewall rules enforced by firewalld

  5. Enable logging on the load balancer to determine whether the load balancer or the backend server is causing the latency.

Testing TCP and HTTP Backend Servers

Learn how to test the TCP and HTTP backend servers used by a Load Balancer resource.

This topic describes how to troubleshoot a load balancer connection. The topology used in this procedure has a public load balancer in a public subnet and the backends are in the same subnet.

Oracle recommends that you use the Oracle Cloud Infrastructure Logging service to troubleshoot issues. (See Details for Load Balancer Logs.)

In addition to using Oracle Cloud Infrastructure logging, however, you can use other utilities listed in this section to troubleshoot the traffic that is processed by the load balancer and sent to a backend. To perform these tests, Oracle recommends that you create an instance in the same network as your load balancer and allow the traffic in the same network security groups and security lists. Use the following tools to troubleshoot:

  • ping

    Before using the more advanced utilities listed here, Oracle recommends that you perform a basic ping test. For this test to succeed, you must allow ICMP traffic between the test instance and the backend.
    $ ping backend_ip_address
    The response should look similar to:
    PING 192.0.2.2 (192.0.2.2) 56(84) bytes of data.
    64 bytes from 192.0.2.2: icmp_seq=1 ttl=64 time=0.028 ms
    64 bytes from 192.0.2.2: icmp_seq=2 ttl=64 time=0.044 ms

    If you receive a message that contains "64 bytes from...", then the ping succeeded.

    Receiving a message that contains "Destination Host Unreachable" indicates that the system does not exist.

    Receiving no message indicates that the system exists but the ICMP protocol is not allowed. Check all firewalls, security lists, and network security groups to ensure ICMP is allowed.

  • curl

    Use the curl utility to send HTTP requests to a specific host, port, or URL.

    • The following example shows using curl to connect to a backend that is sending a 403 Forbidden error:

      $ curl -I http://backend_ip_address/health
      HTTP/1.1 403 Forbidden
      Date: Tue, 17 Mar 2021 17:47:10 GMT
      Content-Type: text/html; charset=UTF-8
      Content-Length: 3539
      Connection: keep-alive
      Last-Modified: Tue, 10 Mar 2021 20:33:28 GMT
      ETag: "dd3-5b3c6975e7600"
      Accept-Ranges: bytes

      In the preceding example, the health check fails, returning a 403 error, indicating that the backend does not have local file permissions configured properly for the health check page.

    • The following example shows using curl to connect to a backend that is sending a 404 Not Found error:

      $ curl -I http://backend_ip_address/health
      HTTP/1.1 404 Not Found
      Date: Tue, 17 Mar 2021 17:47:10 GMT
      Content-Type: text/html; charset=UTF-8
      Content-Length: 3539
      Connection: keep-alive
      Last-Modified: Tue, 10 Mar 2021 20:33:28 GMT
      ETag: "dd3-5b3c6975e7600"
      Accept-Ranges: bytes

      In the preceding example, the health check fails, returning a 404 error, indicating that the health check page does not exist in the expected location.

    • The following example shows a backend that exists and either a network security group, the security lists, or a local firewall is blocking the traffic:

      $ curl -I backend_ip_address
      curl: (7) Failed connect to backend_ip_address:port; Connection refused
    • The following example shows a backend that does not exist:

      $ curl -I backend_ip_address
      curl: (7) Failed connect to backend_ip_address:port; No route to host
  • Netcat

    Netcat is a networking utility for reading from and writing to network connections using TCP or UDP.

    • The following example shows using the netcat utility at the TCP level to ensure that the destination backend server can receive a connection:
      $ nc -vz backend_ip_address port
      Ncat: Connected to backend_ip_address:port.

      In the preceding example, port is open for connections.

    • $ nc -vn backend_ip_address port
      Ncat: Connection timed out.

      In the preceding example, port is closed.

  • Tcpdump

    Use the tcpdump utility to capture all traffic to a backend to ensure which traffic is coming from a load balancer and what is being returned to the load balancer.

    sudo tcpdump -i any -A port port src load_balancer_ip_address
    11:25:54.799014 IP 192.0.2.224.39224 > 192.0.2.224.80: Flags [P.], seq 1458768667:1458770008, ack 2440130792, win 704, options [nop,nop,TS val 461552632 ecr 208900561], length 1341: HTTP: POST /health HTTP/1.1
  • OpenSSL

    When troubleshooting SSL issues between the load balancer instance and the backend servers, Oracle recommends using the openssl utility. This utility opens an SSL connection to a specific host name and port, and prints the SSL certificate and other parameters.

    Other options for troubleshooting issues are:
    • -showcerts

      This option prints all certificates in the certificate chain presented by the backend server. Use this option to identify issues, such as a missing intermediate certificate authority certificate.

    • -cipher cipher_name

      This option forces the client and server use a specific cipher suite and helps to rule out whether the backend is allowing specific ciphers.

  • Netstat

    Use the netstat -natp command to ensure that the application running on the backend server is up and running. For TCP or HTTP traffic, the backend application, IP address, and port must all be in listen mode. If the application port on the backend server is not in listen mode, then the TCP port of the application is not up.

    To resolve this issue, ensure that the application is up and running by either restarting the application or the backend server.

Common Load Balancer Errors

Learn about common load balancer errors associated with the Load Balancing service.

Common load balancer errors include, series 500 and series 400 errors, health check errors, client errors, and SSL errors. The subsequent topics in this section describe these common errors and detail troubleshooting procedures for each, where applicable.

Server Errors (500-599)

Learn about common load balancer server errors (500-599) associated with the Load Balancing service.

504

Error messages:

  • lbStatusCode: "504"

  • backendStatusCode: ""

Oracle Cloud Infrastructure log category: Access log

Symptoms:

The client fails with a 504 error.

Possible causes:

The load balancer is not able to establish connections with any of the backends, even though the health check is marking the backends as available.

Possible solutions:

Configure the health check correctly.

Troubleshooting documentation: Editing a Load Balancer's Health Check Policies

502, 502
Error messages:
  • lbStatusCode: "502"

  • backendStatusCode: "502"

Oracle Cloud Infrastructure log category: Access log and error log

Symptoms:
  • The client fails with a 502 Bad Gateway error.

  • The backend health check succeeds.

  • The backend returns a 502 error.

Possible causes:
  • An application on the backend is returning a 502 error.

  • The backend is configured incorrectly.

  • The backend is likely another reverse proxy or load balancer.

Possible solutions:

Examine the backend application logs to determine why a 502 error is returned.

Troubleshooting documentation: Troubleshooting an HTTP 502 Bad Gateway Error and Testing TCP and HTTP Backend Servers.

502
Error messages:
  • lbStatusCode: "502"

  • backendStatusCode: ""

  • No healthy backends available in associated backend set

Oracle Cloud Infrastructure log category: Access log and error log

Symptoms:
  • The client fails with a 502 Bad Gateway error.

  • The backend health check fails.

  • No traffic observed to a specific backend or all backends.

Possible causes:
  • A backend application is not responding to the health check with the expected response.

  • If no error occurs from the backend, then a TCP health check is configured.

  • A single backend or all backends are configured in drain mode.

Possible solutions:
  • Determine why TCP health check is failing.

  • Convert to HTTP health check.

  • Change the drain mode to false (undrain) for a given backend or all backends.

Troubleshooting documentation: Troubleshooting an HTTP 502 Bad Gateway Error and Testing TCP and HTTP Backend Servers.

Backend Connection Issue

Error:

Backend ip_address abruptly closes connection.

Oracle Cloud Infrastructure log category: Error log

Symptoms:
  • The client fails with a 502 Bad Gateway error.

  • The client reports IO error in load balancer metrics.

Possible causes

The backend connection timeout is configured incorrectly, with a lower timeout value than the load balancer.

Possible solutions:
  • Determine why backend application is timing out.

  • If the backend timeout value needs to be adjusted, then adjust it to be greater than the load balancer timeout value.

Troubleshooting documentation: Testing TCP and HTTP Backend Servers

Session Persistence Issue
Error message:
Persistence selected backend ip_address which failed and no_fallback is selected

Oracle Cloud Infrastructure log category: Error log

Symptoms:
  • The client fails with a 502 Bad Gateway error.

  • Session persistence is failing.

Possible causes
  • Backend set is configured with session persistence and the expected backend is not available because the connection failed or timed out.

  • Fallback option is disabled.

Possible solutions:
  • Determine why backend application is not reachable.

  • Enable fallback option in case the selected server is unavailable.

Troubleshooting documentation: Fallback

For all other 5nn errors, the most likely causes are issues with the backend server.

Client Errors (400-499)

Learn about common load balancer client errors (400-499) associated with the Load Balancing service.

400
Error messages:
  • lbStatusCode: "400"

  • backendStatusCode: ""

  • 400 bad request header or cookie too large

Oracle Cloud Infrastructure log category: Access log

Symptoms:
  • The load balancer returns a status code 400.

  • The backend does not return a status code.

Possible causes:

The client is sending a request that exceeds the configured buffer size.

Possible solutions:

Increase the HTTP request header size on the load balancer. By default, the size limit is 8 KB but raising it to 64 KB resolves the issue.

Troubleshooting documentation: HTTP Header Rules

404, 404
Error messages:
  • lbStatusCode: "404"

  • backendStatusCode: "404"

Oracle Cloud Infrastructure log category: Access log

Symptoms:
  • The load balancer returns a 404 status code.

  • The backend returns a 404 status code.

Possible causes:

The expected page does not exist on the backend.

Possible solutions:
  • Create the missing page.

  • Configure the client to call the correct page.

403, 403
Error messages:
  • lbStatusCode: "403"

  • backendStatusCode: "403"

Oracle Cloud Infrastructure log category: Access log

Symptoms:
  • The load balancer returns a 403 status code.

  • The backend returns a 403 status code.

Possible causes:
  • Expected page does not have sufficient permission on the backend.

  • Expected authentication token is missing or not being forwarded.

Possible solutions:

  • Create missing permissions on backend.

  • Adjust client configuration to ensure that tokens are sent properly.

  • Ensure that all tokens being sent are arriving at the backend.

  • If the header is missing:

    • Adjust header size on the load balancer or the client.

    • Allow headers with special characters.

Troubleshooting documentation: HTTP Header Rules

Health Check Errors

Learn about health check errors associated with Load Balancer resources.

No Healthy Backends
Error message:
No healthy backends available in associated backendSet

Oracle Cloud Infrastructure log category: Error log

Symptoms:

The client fails with a 502 Bad Gateway error.

Possible causes:
  • No backends in the backend set.

  • No backends responding to health check.

Possible solutions:
  • Determine why backends are not responding to health check.

  • Check and adjust any health check settings, including status code, regular expressions, interval timeout, port, and protocol.

Troubleshooting documentation: Editing a Load Balancer's Health Check Policies

Status Code Issues

Backend health status failure reason: Status code Mismatch

Oracle Cloud Infrastructure category: Backend Health Status

Error message:
"msg":"invalid statusCode","statusCode":XXX,"expected":"200"

Oracle Cloud Infrastructure log category: Error log

Symptoms:
  • The backend fails the health check.

  • The client fails with a 502 Bad Gateway error.

  • invalid statusCode appears in the error logs.

Possible causes:
  • The backend is responding with an incorrect response code.

  • The backend health check fails because of response code mismatch.

  • The health check failures are because of an unexpected status code in the regular expression body.

Possible solutions:
  • Determine why the backend is sending the incorrect response code.

  • Adjust the path or status code of the health check to match the backend.

Troubleshooting documentation: Editing a Load Balancer's Health Check Policies

Response Match Failed

Backend Health Status Failure Reason: Regular expression mismatch

Oracle Cloud Infrastructure category: Backend Health Status

Error message:
"response match result: failed"

Oracle Cloud Infrastructure log category: Error log

Symptoms:
  • The backend fails the health check.

  • The client fails with a 502 Bad Gateway error.

  • "response match result: failed" appears in the error logs.

Possible causes:

The backend health check fails because of regular expression mismatch, incorrect value returned, or incorrect value provided to the health check.

Possible solutions:
  • Determine why the backend is sending the incorrect body.

  • Adjust the path or regular expression pattern of the health check to match the backend.

Troubleshooting documentation: Editing a Load Balancer's Health Check Policies

Unreachable Host

Backend Health Status Failure Reason: Connection failed

Oracle Cloud Infrastructure category: Backend Health Status

Error messages:
"errno":"EHOSTUNREACH","syscall":"connect"
"ECONNREFUSED","errno":"ECONNREFUSED"

Oracle Cloud Infrastructure log category: Error log

Symptoms:
  • The backend fails the health check.

  • The client fails with a 502 Bad Gateway error.

  • "EHOSTUNREACH" appears in error logs.

Possible causes:
  • The backend health check fails because of an unreachable host.

  • The backend health check fails because of a connection reset.

  • An application or firewall is actively refusing the connection.

Possible solutions:
  • Check the local instance firewall to confirm that traffic is being allowed.

  • Check the local instance to confirm that the application is running.

  • Check the network security group and security lists to confirm that traffic is allowed.

Troubleshooting documentation: Access and Security

Health Status Issues
Error messages:
"healthStatus":"Unhealthy to Healthy"
"healthStatus":"Healthy to Unhealthy"

Oracle Cloud Infrastructure log category: Error log

Symptoms:
  • The client behaves as expected but fails periodically.

  • The backend switches between passing and failing the health check.

  • "Unhealthy to Healthy" or "Healthy to Unhealthy" appears in error logs.

Possible causes:
  • An unhealthy backend becomes healthy.

  • If the health status of the backend changes often, it can indicate a chronic problem.

Possible solutions:
  • Ensure that the instance is not changing health status abnormally.

  • Check application logs on the backend server for any application-specific issues.

Connection Issues

Backend Health Status Failure Reason: Timed out

Oracle Cloud Infrastructure category: Backend Health Status

Error messages:
"msg":"connect timed out","elapsed":3000}

Oracle Cloud Infrastructure log category: Error log

Symptoms:
  • The client fails with a 502 Bad Gateway error.

  • The backend is periodically or chronically failing health checks.

  • "connect timed out" appears in the error logs.

Possible causes:
  • The backend server is not responding to health checks in the expected time period.

  • Slow upstream dependency including, database, application service or API, or slow storage services, such as Oracle Cloud Infrastructure File Storage service, Elastic Block Store, or Object Storage.

Possible solutions:
  • Perform a local test to the backend to eliminate the load balancer as a cause.

  • Check the performance of all upstream dependencies.

  • Check application logs on the backend server for any dependencies reporting any sort of timeout.

Troubleshooting documentation: Testing TCP and HTTP Backend Servers.

SSL Errors

Learn about single socket layer (SSL) errors associated with Load Balancer resources.

SSL Virtual Listener Issues
Error message:
Not all SSL virtual listeners on port 443 have the same set of SSL protocols defined

Symptoms:

You cannot create backends for an existing load balancer nor can you add new servers to the backend created previously within the same load balancer.

Possible causes:

Mismatch of transport layer security (TLS) versions.

Possible solutions:

Match TLS versions on the listeners.

Troubleshooting documentation: SSL Certificate for Load Balancers

SSL Handshake Issues
Error message:
(SSL: error:140770FC:SSL routines:SSL23_GET_SERVER_HELLO:unknown protocol) while SSL handshake error

Oracle Cloud Infrastructure log category: Client log

Symptoms:

The client experiences SSL handshake failures in Oracle Cloud Infrastructure metrics (see Load Balancing Metrics).

Possible causes:

The backend is not configured to accept SSL.

Possible solutions:
  • Confirm that the backend certificate matches the certificate authority that is provided.

  • Ensure that all certificates in the chain are provided in the correct order in the Certificate field.

  • Ensure that you provide the correct certificate depth.

Troubleshooting documentation: SSL Certificate for Load Balancers

Backend SSL Handshake Issues
Error messages:
Peer backend_ip_address closed connection in SSL handshake

Oracle Cloud Infrastructure log category: Error log

Symptoms:
  • The client fails with a 502 Bad Gateway error.

  • The client experiences SSL handshake failures in Oracle Cloud Infrastructure metrics (see Load Balancing Metrics).

Possible causes:
  • The backend is not configured to accept SSL.

  • The backend certificate is invalid.

Possible solutions:
  • Confirm that the backend certificate matches the certificate authority that is provided.

  • Ensure that all certificates in the chain are provided in the correct order in the Certificate field.

  • Ensure that you provide the correct certificate depth.

Troubleshooting documentation: SSL Certificate for Load Balancers

SSL Certificate Issues

Error:

Client backend_ip_address has SSL certificate verify error.

Oracle Cloud Infrastructure log category: Error log

Symptoms:

The client experiences SSL handshake failures in Oracle Cloud Infrastructure metrics (see Load Balancing Metrics).

Possible causes:
  • The client certificate is invalid.

  • The client certificate is not trusted.

  • Invalid peer certification verify depth.

Possible solutions:
  • Ensure that the client certificate is valid.

  • Remove Peer Cert Verify feature on the listener.

Troubleshooting documentation: Key Pair Mismatch and Private Key Consistency.

Client SSL Certificate Issues
Error message:
Client backend_ip_address sent no required SSL certificate

Oracle Cloud Infrastructure log category: Error log

Symptoms:
  • The client experiences a 400 Response error.

  • no required SSL certificate appears in error logs.

Possible causes:

The client is not sending a client certificate.

Possible solutions:
  • Update the client to send the correct client certificate.

  • Remove Peer Cert Verify feature on the listener.

  • Adjust the certificate verification depth.

Troubleshooting documentation: Configuring Peer Certificate Verification.

SSL Error Causes Backend Health Check Failure
Error message:
"code":"EPROTO","errno":"EPROTO"

Oracle Cloud Infrastructure log category: Error log

Symptoms:

The backend health check fails because of the SSL error.

Possible causes:

The backend is configured to accept SSL but the health check protocol selected does not match that of the backend.

Possible solutions:

Confirm that you are using non-TLS health check on a backend that has TLS enabled.

Troubleshooting documentation: Editing a Load Balancer's Health Check Policies

SSL Host Name Verification Fails
Error message:
SSL host name verification failed for host_name

Oracle Cloud Infrastructure log category: Error log

Symptoms:
  • The client fails with a 502 Bad Gateway error.

  • Error message contains SSL host name verification failed.

Possible causes:

Host name provided does not match what is expected.

Possible solutions:
  • Configure client to use the expected host name.

  • Configure certificate to match the host name sent by the client.

Troubleshooting documentation: SSL Certificate for Load Balancers

Client-Side Errors

Learn about client-side errors associated with Load Balancer resources.

Client Access Denied

Error:

Access for client_ip_address denied by HTTP ACL rule.

Oracle Cloud Infrastructure log category: Error log

Symptoms:
  • The client fails with a 502 Bad Gateway error.

  • The backend does not pass health check.

  • forbidden by HTTP ACL rule appears in the error log.

Possible causes:

Access control rule set is enabled bud does not include the source IP address.

Possible solutions:

Check and apply respective rule set to include the source IP address.

Troubleshooting documentation: Access Control Rules

Client Timeout Issue

Error:

Client client_name timed out

Oracle Cloud Infrastructure log category: Error log

Symptoms:
  • The client fails with a 502 Bad Gateway error.

  • The client experiences SSL handshake failures in Oracle Cloud Infrastructure metrics (see Load Balancing Metrics).

Possible causes:

The client terminated the connection sooner than the configured timeout for the load balancer.

Possible solutions:
  • Configure client timeout to match expected application configuration.

  • Determine why the backend did not respond in the configured amount of time.

Troubleshooting documentation: Testing TCP and HTTP Backend Servers.