Availability strategies for Java Enterprise System deployments include the following:
Load balancing. Uses redundant hardware and software components to share a processing load. A load balancer directs any requests for a service to one of multiple symmetric instances of the service. If any one instance should fail, other instances are available to assume a heavier load.
Failover. Involves managing redundant hardware and software to provide continuous access of services and security for critical data if any component fails.
Sun Cluster software provides a failover solution for critical data managed by back-end components such as the message storage for Messaging Server and calendar data for Calendar Server.
Replication of services. Replication of services provides multiple sources for access to the same data. Directory Server provides numerous replication and synchronization strategies for LDAP directory access.
The following sections provide some examples of availability solutions that provide various levels of load balancing, failover, and replication of services.
Place all computing resources for a service on a single server. If the server fails, the entire service fails.
Sun provides high-end servers that provide the following benefits:
Replacement and reconfiguration of hardware components while the system is running
Ability to run multiple applications in fault-isolated domains on the server
Ability to upgrade capacity, performance speed, and I/O configuration without rebooting the system
A high-end server typically costs more than a comparable multi-server system. However, a single server provides savings on administration, monitoring, and hosting costs for servers in a data center. Load balancing, failover, and removal of single points of failure is more flexible with multi-server systems.
There are several ways to increase availability with parallel redundant servers that provide both load balancing and failover. The following figure illustrates two replicate servers providing an N+1 failover system. An N+1 system has an additional server to provide 100% capacity should one server fail.
The computing power of each server in Horizontally Redundant Systems above is identical. One server alone handles the performance requirements. The other server provides 100% of the performance when called into service as a backup.
The advantage of an N+1 failover design is 100% performance during a failover situation. Disadvantages include increased hardware costs with no corresponding gain in overall performance (because one server is a standby for use in failover situations only).
The following figure illustrates a system that implements load balancing plus failover that distributes the performance between two servers.
In the system depicted in Horizontally Redundant Systems above, if one server fails, all services are available, although at a percentage of the full capacity. The remaining server provides 6 CPUs of computing power, which is 60% of the 10 CPU requirement.
An advantage of this design is the additional 2 CPU latent capacity when both servers are available.
The following figure illustrates a distribution between a number of servers for performance and load balancing.
Because there are five servers in the design depicted in Horizontally Redundant Systems, if one server fails the remaining servers provide a total of 8 CPUs of computing power, which is 80% of the 10 CPU performance requirement. If you add an additional server with a 2-CPU capacity to the design, you effectively have an N+1 design. If one server fails, 100% of the performance requirement is met by the remaining servers.
This design includes the following advantages:
Added performance if a single server fails
Availability even when more than one server is down
Servers can be rotated out of service for maintenance and upgrades
Multiple low-end servers typically cost less than a single high-end server
However, administration and maintenance costs can increase significantly with additional servers. You also have to consider costs for hosting the servers in a data center. At some point you run into diminishing returns by adding additional servers.
For situations that require a high degree of availability (such as four or five nines), you might consider Sun Cluster software as part of your availability design. A cluster system is the coupling of redundant servers with storage and other network resources. The servers in a cluster continually communicate with each other. If one of the servers goes offline, the remainder of the devices in the cluster isolate the server and fail over any application or data from the failing node to another node. This failover process is achieved relatively quickly with little interruption of service to the users of the system.
Sun Cluster software requires additional dedicated hardware and specialized skills to configure, administer, and maintain.