4.16 Using HAProxy for PGX Load Balancing and High Availability

HAProxy is a high-performance TCP/HTTP load balancer and proxy server that allows multiplexing incoming requests across multiple web servers.

You can use HAProxy with multiple instances of the in-memory analytics server (PGX) for high availability. The following example uses the opg shell to connect to PGX.

The following instructions assume you have already installed and configured the in-memory analyst server, as explained in Starting the In-Memory Graph Server (PGX).

  1. If HAProxy is not already installed on Big Data Appliance or your Oracle Linux distribution, run this command:
    yum install haproxy
  2. Start the PGX servers.

    For example, if you want to load balance PGX across 4 nodes (such as bda02, bda03, bda04, and bda05) in the Big Data Appliance, start PGX on each of these nodes. Configure PGX to listen for connections on port 7007.

  3. Configue HAProxy.

    In this example, you will configure HAProxy to run on host bda01 and to listen for incoming connections on port 8888. Create a new file haproxy.cfg on host bda01 with the following content:

    global
        maxconn 50000
        log /dev/log local0
     
    defaults
        mode http
        option httplog
        log global
        option forwardfor
        timeout connect 5s
        timeout client 5s
        timeout server 5s
        balance source
        hash-type consistent
     
    listen www
        bind :8888
        server web1 bda02:7007 check
        server web2 bda03:7007 check
        server web3 bda04:7007 check
        server web4 bda05:7007 check
    

    Specifying balance source maps the clients' IP addresses to corresponding servers' IP addresses. This is important because the PGX server relies on session stickiness during an analytics session. (For more information about configuring HAProxy, see the HAProxy official documentation.)

  4. Start the load balancer.

    Start HAProxy on bda01 by passing in configuration file that you created in the preceding step:

    haproxy -f haproxy.cfg
  5. Test the load balancer.

    From any host you can test connectivity to the HAProxy server by passing in the host and port of the server running HAProxy as the base_url parameter to the opg client shell. For example:

    cd /opt/oracle/oracle-spatial-graph/property_graph
    ./bin/opg --base_url http://bda01:8888
    

    Note:

    The PGX in-memory state is lost if the server goes down. HAProxy will route commands to another server, but the client must reload all graph data.

    It is recommended that you run a series of PGX commands to test session affinity. Kill a server and restart the opg shell to confirm that HAProxy redirects the request to a new server.