14.1 Overview of High-Availability Features

In the context of Oracle Traffic Director instances, high availability includes the following capabilities:

Receive and serve client requests without downtime caused by hardware failures, kernel crashes, and network issues.
- You can set up a highly available traffic routing and load-balancing service for your enterprise applications and services by configuring two Oracle Traffic Director instances to provide active-active or active-passive failover. For more information, see Section 14.2, "Creating and Managing Failover Groups."
- If an Oracle Traffic Director process crashes, it restarts automatically.
  
  Oracle Traffic Director provides two levels of availability, application level and node level. Application level availability is the default feature and does not require any additional configuration. Application level availability ensures that the load balancing service is monitored through the Oracle Traffic Director Watchdog daemon and is available even during application level failures such as process crash. This feature ensures that Oracle Traffic Director as a software load balancer can continue to front-end requests to back-end applications even if there is a software issue within the load balancing service. The node level availability ensures that Oracle Traffic Director continues to front-end requests to back-end applications even if the system/vServer crashes because of issues such as CPU failure or memory corruption. For node level availability, Oracle Traffic Director must be installed on two compute notes or vServers, and a failover group must be configured between them.
  
  To provide high availability to the Oracle Traffic Director instance itself, each load balancer server instance includes at least three processes, a watchdog process, a primordial process, and one or more load balancer processes. The watchdog process spawns the primordial, which then spawns the load balancer processes. The watchdog process and the primordial process provide a limited level of high availability within the server processes. If the load balancer process or primordial process terminates abnormally for any reason, then Oracle Traffic Director watchdog is responsible for restarting these services, to ensure that Oracle Traffic Director as a software load balancer service continues to be available. An Oracle Traffic Director instance will have exactly one watchdog process, one primordial process and one or more load balancer processes.
- Most configuration changes to Oracle Traffic Director instances can be deployed dynamically, without restarting the instances and without affecting requests that are being processed. For configuration changes that do require instances to be restarted, the administration interfaces—CLI and administration console—display a prompt to restart the instances.
Distribute client requests reliably to origin servers in the back end.
- If a server in the back end is no longer available or is fully loaded, Oracle Traffic Director detects this situation automatically through periodic health checks and stops sending client requests to that server. When the failed server becomes available again, Oracle Traffic Director detects this automatically and resumes sending requests to the server. For more information, see Section 14.3, "Configuring Health-Check Settings for Origin-Server Pools."
- In each origin-server pool, you can designate a few servers as backup servers. Oracle Traffic Director sends requests to the backup servers only when none of the primary servers in the pool is available. For more information, see Section 6.3, "Modifying an Origin-Server Pool."
- You can reduce the possibility of requests being rejected by origin servers due to a connection overload, by specifying the maximum number of concurrent connections that each origin server can handle.
  
  For each origin server, you can also specify the duration over which the rate of sending requests to the server is increased. This capability helps minimize the possibility of requests getting rejected when a server that was offline is in the process of restarting.
  
  For more information, see Section 7.3, "Modifying an Origin Server."