14 Configuring Oracle Traffic Director for High Availability

This chapter describes the high-availability capabilities of Oracle Traffic Director. It contains the following sections:

14.1 Overview of High-Availability

In the context of Oracle Traffic Director instances, high availability includes the following capabilities:

  • Receive and serve client requests without downtime caused by hardware failures, kernel crashes, and network issues.

    • You can set up a highly available traffic routing and load-balancing service for your enterprise applications and services by configuring two Oracle Traffic Director instances to provide active-active or active-passive failover.

    • If an Oracle Traffic Director process crashes, it restarts automatically.

      Oracle Traffic Director provides two levels of availability, application level and node level. Application level availability is the default feature and does not require any additional configuration. Application level availability ensures that the load balancing service is monitored through the Oracle Traffic Director Watchdog daemon and is available even during application level failures such as process crash. This feature ensures that Oracle Traffic Director as a software load balancer can continue to front-end requests to back-end applications even if there is a software issue within the load balancing service. The node level availability ensures that Oracle Traffic Director continues to front-end requests to back-end applications even if the system/vServer crashes because of issues such as CPU failure or memory corruption. For node level availability, Oracle Traffic Director must be installed on two compute nodes or vServers, and a failover group must be configured between them.

      To provide high availability to the Oracle Traffic Director instance itself, each load balancer server instance includes at least three processes, a watchdog process, a primordial process, and one or more load balancer processes. The watchdog process spawns the primordial, which then spawns the load balancer processes. The watchdog process and the primordial process provide a limited level of high availability within the Oracle Traffic Director processes. If the load balancer process or primordial process terminates abnormally for any reason, then Oracle Traffic Director watchdog is responsible for restarting these services, to ensure that Oracle Traffic Director as a software load balancer service continues to be available. An Oracle Traffic Director instance will have exactly one watchdog process, one primordial process and one or more load balancer processes.

    • Most configuration changes to Oracle Traffic Director instances can be deployed dynamically, without restarting the instances and without affecting requests that are being processed. For configuration changes that require instances to be restarted, the Fusion Middleware Control displays a prompt to restart the instances. However, this prompt is not displayed in the WLST commands, an error is logged in the server log.

  • Distribute client requests reliably to origin servers in the back end.

    • If a server in the back end is no longer available or is fully loaded, Oracle Traffic Director detects this situation automatically through periodic health checks and stops sending client requests to that server. When the failed server becomes available again, Oracle Traffic Director detects this automatically and resumes sending requests to the server. For more information, see Section 5.7, "Configuring Health-Check Settings for Origin-Server Pools."

    • In each origin-server pool, you can designate a few servers as backup servers. Oracle Traffic Director sends requests to the backup servers only when none of the primary servers in the pool is available. For more information, see Section 5.3, "Modifying an Origin-Server Pool."

    • You can reduce the possibility of requests being rejected by origin servers due to a connection overload, by specifying the maximum number of concurrent connections that each origin server can handle.

      For each origin server, you can also specify the duration over which the rate of sending requests to the server is increased. This capability helps minimize the possibility of requests getting rejected when a server that was offline is in the process of restarting.

      For more information, see Section 6.3, "Modifying an Origin Server."

14.1.1 High Availability in Network Topology

You can ensure high availability of Oracle Traffic Director instances by combining two or more Oracle Traffic Director instances in a failover group represented by one or two virtual IP (VIP) addresses. Both the hosts in a failover group must run the same operating system version, use identical patches and service packs, and run Oracle Traffic Director instances of the same configuration.

Note:

  • You can create multiple failover groups for the same instance, but with a distinct VIP address for each failover group.

  • On Oracle SuperCluster and Exalogic (Solaris), Oracle Traffic Director can be configured for high availability only when installed on a global zone. In addition, all administration nodes must be running on the global zone.

Figure 14-1 shows Oracle Traffic Director network topology.

Figure 14-1 Oracle Traffic Director Network Topology

Description of Figure 14-1 follows
Description of ''Figure 14-1 Oracle Traffic Director Network Topology''

The topology shown in Figure 14-1 consists of two Oracle Traffic Director instances—otd_1 and otd_2—forming a failover pair and providing a single virtual IP address for client requests. Based on the mode of failover configured, the primary node will determine how and where to forward the request. For information on failover modes, see Section 14.1.2, "Failover configuration modes".

Note that Figure 14-1 shows only two server pools in the back end, but you can configure Oracle Traffic Director to route requests to servers in multiple server pools.

14.1.2 Failover configuration modes

You can configure the Oracle Traffic Director instances in a failover group to work in the following modes:

  • Active-passive: A single VIP address is used. One instance in the failover group is designated as the primary node. If the primary node fails, the requests are routed through the same VIP to the other instance.

  • Active-active: A single VIP address is used. One of the nodes is the master node, and the other nodes are backup nodes. The incoming requests to VIP is distributed among the OTD instances. If the master node fails, then the backup node having the highest priority will be chosen as the next master node.

Figure 14-2 High Availability deployment of Oracle Traffic Director


14.1.3 Failover in Active-Passive Mode

In the active-passive setup described here, one node in the failover group is redundant at any point in time. To improve resource utilization, you can configure the two Oracle Traffic Director instances in active-active mode with two virtual IP addresses. Each instance caters to requests received on one virtual IP address and backs up the other instance.

Oracle Traffic Director provides support for failover between the instances in a failover group by using an implementation of the Virtual Routing Redundancy Protocol (VRRP), such as keepalived for Linux and vrrpd (native) for Solaris.

Figure 14-3 Failover in Active-Passive Mode


Keepalived provides other features such as load balancing and health check for origin servers, but Oracle Traffic Director uses only the VRRP subsystem. For more information about Keepalived, go to http://www.keepalived.org.

Note:

Ensure to install Keepalived v1.2.12(minimum version required).

VRRP specifies how routers can failover a VIP address from one node to another if the first node becomes unavailable for any reason. The IP failover is implemented by a router process running on each of the nodes. In a two-node failover group, the router process on the node to which the VIP is currently assigned is called the master. The master continuously advertises its presence to the router process on the second node.

Caution:

On a host that has an Oracle Traffic Director instance configured as a member of a failover group, Oracle Traffic Director should be the only consumer of Keepalived. Otherwise, when Oracle Traffic Director starts and stops the keepalived daemon for effecting failovers during instance downtime, other services using keepalived on the same host can be disrupted.

If the node on which the master router process is running fails, the router process on the second node waits for about three seconds before deciding that the master is down, and then assumes the role of the master by assigning the VIP to its node. When the first node is online again, the router process on that node takes over the master role. For more information about VRRP, see RFC 5798 at http://datatracker.ietf.org/doc/rfc5798.

14.1.4 Failover in Active-Active Mode

Oracle Traffic Director provides support for failover between the instances by deploying two or more OTD instances on the nodes which are in the same subnet. One of the nodes is chosen as the active router node and the remaining node(s) are the backup router node(s).The traffic will be managed among all the OTD instances.

The solution also uses Keepalived v 1.2.13 and Linux Virtual Server (LVS) to perform load balancing and failover tasks. In addition, the following packages are required.

  • ipvsadm (1.26 or later)

  • iptables (1.4.7 or later)

Figure 14-4 Failover in Active-Active Mode


In the beginning, all the nodes are configured as the backup nodes and the nodes are assigned different priorities. The highest priority node is chosen as the master and the other nodes are the backup nodes. If the master node fails, then the backup node having the highest priority is chosen as the next master node. The keepalived master node will also be the master node for LVS.

Keepalived does following:

  • Plumbs the virtual IP on the master

  • Sends out gratuitous ARP messages for the VIP

  • Configure the LVS (ipvsadm)

  • Health-check for Keepalived on other nodes

LVS does following:

  • Balance the load across the OTD instances

  • Share the existing connection information to the backup nodes via multicasting.

  • To check the integrity of the services on each OTD instance. In case, any OTD fails then that OTD instance will be removed from the LVS configuration and when it comes back online then it will be added again.

14.2 Creating and Managing Failover Groups

This section contains the following topics:

14.2.1 Creating Failover Groups

This section describes how to implement a highly available pair of Oracle Traffic Director instances by creating failover groups.

Note:

For information about invoking WLST, see Section 1.7.1, "Accessing WebLogic Scripting Tool."

Before You Begin

  • Decide the unique VIP address that you want to assign to the failover group.

    • The VIP addresses should belong to the same subnet as that of the nodes in the failover group.

    • The VIP addresses must be accessible to clients.

  • Identify the Oracle Traffic Director nodes that you want to configure as primary and backup nodes in the failover group. The nodes should be in the same subnet.

    Note that the administration nodes that you select should have Oracle Traffic Director instances present on them for the specified configuration.

  • Identify the network interface for each node.

    For each network interface that is currently up on the host, the administration server compares the network part of the interface's IP address with the network part of the specified VIP. The first network interface that results in a match is used as the network interface for the VIP.

    For this comparison, depending on whether the VIP specified for the failover group is an IPv4 or IPv6 address, the administration server considers only those network interfaces on the host that are configured with an IPv4 or IPv6 address, respectively.

  • You can bind to a VIP IP address within the HTTP listener by performing a system configuration that allows you to bind to a non-existing address, as a sort of forward binding. Perform one of the following system configurations:

    echo 1 > /proc/sys/net/ipv4/ip_nonlocal_bind

    or,

    sysctl net.ipv4.ip_nonlocal_bind=1 (change in /etc/sysctl.conf to keep after a reboot)

    Make sure that the IP addresses of the listeners in the configuration for which you want to create a failover group are either an asterisk (*) or the same address as the VIP. Otherwise, requests sent to the VIP will not be routed to the virtual servers.

  • Make sure that the router ID for each failover group is unique. For every subsequent failover group that you create, the default router ID is decremented by one: 254, 253, and so on.

14.2.1.1 Creating Failover Groups Using Fusion Middleware Control

To create a failover group by using the Fusion Middleware Control, do the following:

  1. Log in to Fusion Middleware Control, as described in Section 1.7.2, "Displaying Fusion Middleware Control."

  2. Click the WebLogic Domain button at the upper left corner of the page.

  3. Select Administration > OTD Configurations.

    A list of the available configurations is displayed.

  4. Select the configuration for which you want to create a failover group.

  5. Click the Traffic Director Configuration In the Common Tasks pane.

  6. Select Administration > Failover Groups.

    The Failover Groups page is displayed. It shows a list of the Failover Groups defined for the configuration.

  7. Click Create.

    The New Failover Group wizard is displayed.

    Figure 14-5 New Failover Group Wizard

    Description of Figure 14-5 follows
    Description of ''Figure 14-5 New Failover Group Wizard''

  8. Follow the on-screen prompts to complete creation of the failover group by using the details—virtual IP address, network interface, host names of administration nodes, and so on—that you decided earlier.

    After the failover group is created, the Results screen of the New Failover Group wizard displays a message confirming successful creation of the failover group.

  9. Click Close on the Results screen.

    The details of the failover group that you just created are displayed on the Failover Groups page.

    Note:

    At this point, the two nodes form an active-passive pair. To convert them into an active-active pair, create another failover group with the same two nodes, but with a different VIP and with the primary and backup roles reversed.

14.2.1.2 Creating Failover Groups Using WLST

To create a failover group, run the otd_createFailoverGroup command.

For example, the following command creates an active-passive failover group with the following details:

  • Configuration: ha

  • Primary instance: 1.example.com

  • Backup instance: 2.example.com

  • Virtual IP address: 192.0.2.1

props = {}
props['configuration'] = 'ha'
props['virtual-ip'] = '192.0.2.1'
props['primary-instance'] = '1.example.com'
props['backup-instance'] = '2.example.com'
props['primary-nic'] = 'eth0'
props['backup-nic'] = 'eth0'

props['failover-type'] = 'active-passive'
otd_createFailoverGroup(props)

Note:

When creating a failover group you must run otd_startFailover on those machines as a root user. This is to manually start the failover. If this command is not executed, failover will not start and there will be no high availability. For more information about otd_startFailover, see WebLogic Scripting Tool Command Reference for Oracle Traffic Director.

For example, the following command creates an active-active failover group with the following details:

  • Configuration: ha

  • Primary instance: 1.example.com

  • Backup instance: 2.example.com

  • Virtual IP address: 192.0.2.1

props = {}
props['configuration'] = 'ha'
props['virtual-ip'] = '192.0.2.1'
props['failover-type'] = 'active-active'
otd_createFailoverGroup(props)

For more information about otd_createFailoverGroup, see WebLogic Scripting Tool Command Reference for Oracle Traffic Director.

14.2.2 Managing Failover Groups

To manage the failover groups, the failover daemon needs to run as a privileged user (typically root), otd_startFailover command should be executed as a privileged user on the machines on which the primary and backup instances of the failover group run. Similarly to stop the daemon, you should run the otd_stopFailover. The configuration parameters for the keepalived daemon are stored in a file named keepalived.conf in the config directory of each instance that is part of the failover group. For more information about otd_startFailover or otd_stopFailover, see the WebLogic Scripting Tool Command Reference for Oracle Traffic Director.

After creating failover groups, you can list them, view their settings, change the primary instance for a failover group, switch the primary and backup instances, and delete them. Note that to change the VIP or any property of a failover group, you should delete the failover group and create it afresh.

You can view, modify, and delete failover groups by using either Fusion Middleware Control or the WLST.

Note:

For information about invoking WLST, see Section 1.7.1, "Accessing WebLogic Scripting Tool."

Managing Failover Groups Using Fusion Middleware Control

To view, modify, and delete failover groups by using the Fusion Middleware Control, do the following:

  1. Log in to Fusion Middleware Control, as described in Section 1.7.2, "Displaying Fusion Middleware Control."

  2. Click the WebLogic Domain button at the upper left corner of the page.

  3. Select Administration > OTD Configurations.

    A list of the available configurations is displayed.

  4. Select the configuration for which you want to manage failover groups.

  5. Click the Traffic Director Configuration In the Common Tasks pane.

  6. Select Administration > Failover Groups.

  7. The Failover Groups page is displayed. It shows a list of the Failover Groups defined for the configuration.

    • To view the properties of a failover group, click its virtual IP.

    • To switch the hosts for the primary and backup nodes, click the Toggle Primary button. In the resulting dialog box, click OK.

    • To delete a failover group, click the Delete button. In the resulting dialog box, click OK.

  8. If you add or remove a failover instance from an active-active failover group, ensure to stop and start the failover group on all nodes to see the changes.

Note:

  • If you want to assign a different node as the primary or backup node in a failover group, you should create the failover group afresh.

  • There can be a maximum of 255 failover groups across configurations.

Managing Failover Groups Using WLST

For example, run the otd_listFailoverGroups command, for list of failover groups:

props = {}
props['configuration'] = 'ha'
otd_listFailoverGroups(props)

For example, run the otd_toggleFailovergroupPrimary command, for toggle a failover group:

props = {}
props['configuration'] = 'ha'
props['virtual-ip'] = '10.128.67.44'
otd_toggleFailovergroupPrimary(props)

For example, run the otd_getFailoverGroupProperties command, for change properties of a failover group:

props = {}
props['configuration'] = 'ha'
props['primary-instance'] = '1.example.com'
otd_getFailoverGroupProperties(props)

For example, run the otd_deleteFailoverGroup command, for deleting a failover group:

props = {}
props['configuration'] = 'ha'
props['virtual-ip'] = '10.128.67.44'
otd_deleteFailoverGroup(props)

WLST commands specific to active-active HA

For example, run the otd_addFailoverInstance command, for adding a failover instance:

props = {}
props['configuration'] = 'ha'
props['virtual-ip'] = '10.128.67.44'
props['instance'] = '1.example.com'
props['nic'] = 'eth0'
otd_addFailoverInstance(props)

For example, run the otd_removeFailoverInstance command, for removing a failover instance:

props = {}
props['configuration'] = 'ha'
props['virtual-ip'] = '10.128.67.44'
props['instance'] = '1.example.com'
otd_removeFailoverInstance(props)

For example, run the otd_listFailoverInstances command, for the list of failover instances:

props = {}
props['configuration'] = 'ha'
props['virtual-ip'] = '10.128.67.44'
otd_listFailoverInstances(props)

For example, run the otd_setFailoverInstanceOrder command, for changing the failover instance order.

props = {}
props['configuration'] = 'ha'
props['virtual-ip'] = '10.128.67.44'
props['instances'] = '1.example.com, 2.example.com'
otd_setFailoverInstanceOrder(props)

For more information, see WebLogic Scripting Tool Command Reference for Oracle Traffic Director.