When developing a strategy for availability requirements, study the component interactions and usage analysis to determine which availability solutions to consider. Do your analysis on a component-by-component basis, determining a best-fit solution for availability and failover requirements.
The following items are examples of the type of information you gather to help determine availability strategies:
How many nines of availability are specified?
What are the performance specifications with respect to failover situations (for example, at least 50% of performance during failover)?
Does the usage analysis identify times of peak and non-peak usage?
What are the geographical considerations?
The availability strategy you choose must also take into consideration serviceability requirements, as discussed in Designing for Optimum Resource Usage. Avoid complex solutions that require considerable administration and maintenance.
Availability strategies for Java Enterprise System deployments include the following:
Load balancing. Uses redundant hardware and software components to share a processing load. A load balancer directs any requests for a service to one of multiple symmetric instances of the service. If any one instance should fail, other instances are available to assume a heavier load.
Failover. Involves managing redundant hardware and software to provide continuous access of services and security for critical data if any component fails.
Sun Cluster software provides a failover solution for critical data managed by back-end components such as the message storage for Messaging Server and calendar data for Calendar Server.
Replication of services. Replication of services provides multiple sources for access to the same data. Directory Server provides numerous replication and synchronization strategies for LDAP directory access.
The following sections provide some examples of availability solutions that provide various levels of load balancing, failover, and replication of services.
Place all computing resources for a service on a single server. If the server fails, the entire service fails.
Sun provides high-end servers that provide the following benefits:
Replacement and reconfiguration of hardware components while the system is running
Ability to run multiple applications in fault-isolated domains on the server
Ability to upgrade capacity, performance speed, and I/O configuration without rebooting the system
A high-end server typically costs more than a comparable multi-server system. However, a single server provides savings on administration, monitoring, and hosting costs for servers in a data center. Load balancing, failover, and removal of single points of failure is more flexible with multi-server systems.
There are several ways to increase availability with parallel redundant servers that provide both load balancing and failover. The following figure illustrates two replicate servers providing an N+1 failover system. An N+1 system has an additional server to provide 100% capacity should one server fail.
The computing power of each server in Horizontally Redundant Systems above is identical. One server alone handles the performance requirements. The other server provides 100% of the performance when called into service as a backup.
The advantage of an N+1 failover design is 100% performance during a failover situation. Disadvantages include increased hardware costs with no corresponding gain in overall performance (because one server is a standby for use in failover situations only).
The following figure illustrates a system that implements load balancing plus failover that distributes the performance between two servers.
In the system depicted in Horizontally Redundant Systems above, if one server fails, all services are available, although at a percentage of the full capacity. The remaining server provides 6 CPUs of computing power, which is 60% of the 10 CPU requirement.
An advantage of this design is the additional 2 CPU latent capacity when both servers are available.
The following figure illustrates a distribution between a number of servers for performance and load balancing.
Because there are five servers in the design depicted in Horizontally Redundant Systems, if one server fails the remaining servers provide a total of 8 CPUs of computing power, which is 80% of the 10 CPU performance requirement. If you add an additional server with a 2-CPU capacity to the design, you effectively have an N+1 design. If one server fails, 100% of the performance requirement is met by the remaining servers.
This design includes the following advantages:
Added performance if a single server fails
Availability even when more than one server is down
Servers can be rotated out of service for maintenance and upgrades
Multiple low-end servers typically cost less than a single high-end server
However, administration and maintenance costs can increase significantly with additional servers. You also have to consider costs for hosting the servers in a data center. At some point you run into diminishing returns by adding additional servers.
For situations that require a high degree of availability (such as four or five nines), you might consider Sun Cluster software as part of your availability design. A cluster system is the coupling of redundant servers with storage and other network resources. The servers in a cluster continually communicate with each other. If one of the servers goes offline, the remainder of the devices in the cluster isolate the server and fail over any application or data from the failing node to another node. This failover process is achieved relatively quickly with little interruption of service to the users of the system.
Sun Cluster software requires additional dedicated hardware and specialized skills to configure, administer, and maintain.
This section contains two examples of availability strategies based on the identity-based communications solution for a medium-sized enterprise of about 1,000 to 5,000 employees, as described previously in Identity-Based Communications Example. The first availability strategy illustrates load balancing for Messaging Server. The second illustrates a failover solution that uses Sun Cluster software.
The following table lists the estimates for CPU power for each logical Messaging Server component in the logical architecture. This table repeats the final estimation calculated in the section Update the CPU Estimates .
Table 5–6 CPU Estimate Adjustments for Supporting Components
Component |
CPUs |
Memory |
---|---|---|
Messaging Server(MTA, inbound) |
2 |
4 GB |
Messaging Server(MTA, outbound) |
2 |
4 GB |
Messaging Server(MMP) |
2 |
4 GB |
Messaging Server(Message Store) |
2 |
4 GB |
For this example, assume that during technical requirements phase, the following quality of service requirements were specified:
Availability. Overall system availability should be 99.99% (does not include scheduled downtime). Failure of an individual computer system should not result in service failure.
Scalability. No server should be more than 80% utilized under daily peak load and the system must accommodate long-term growth of 10% per year.
To fulfill the availability requirement, for each Messaging Server component provide two instances, one of each on separate hardware servers. If a server for one component fails, the other provides the service. The following figure illustrates the network diagram for this availability strategy.
In the preceding figure the number of CPUs has doubled from the original estimate. The CPUs are doubled for the following reasons:
In the event one server fails, the remaining server provides the CPU power to handle the load.
For the scalability requirement that no single server is more than 80% utilized under peak load, the added CPU power provides this safety margin.
For the scalability requirement to accommodate 10% increased load per year, the added CPU power adds latent capacity that can handle increasing loads until additional scaling would be needed.
The following figure shows an example of failover strategy for Calendar Server back-end and Messaging Server messaging store. The Calendar Server back-end and messaging store are replicated on separate hardware servers and configured for failover with Sun Cluster software. The number of CPUs and corresponding memory are replicated on each server in the Sun Cluster.
Directory services can be replicated to distribute transactions across different servers, providing high availability. Directory Server provides various strategies for replication of services, including the following:
Multiple databases. Stores different portions of a directory tree in separate databases.
Chaining and referrals. Links distributed data into a single directory tree.
Single master replication. Provides a central source for the master database, which is then distributed to consumer replicas.
Multi-master replication. Distributes the master database among several servers. Each of these masters then distributes their database among consumer replicas.
Availability strategies for Directory Server is a complex topic that is beyond the scope of this guide. The following sections, Single Master Replication and Multi-Master Replication provide a high-level view of basic replication strategies. For detailed information see Chapter 12, Designing a Highly Available Deployment, in Sun Java System Directory Server Enterprise Edition 6.0 Deployment Planning Guide.
The following figure shows a single master replication strategy that illustrates basic replication concepts.
In single master replication, one instance of Directory Server manages the master directory database, logging all changes. The master database is replicated to any number of consumer databases. The consumer instances of Directory Server are optimized for read and search operations. Any write operation received by a consumer is referred back to the master. The master periodically updates the consumer databases.
Advantages of single master replication include:
Single instance of Directory Server optimized for database read and write operations
Any number of consumer instances of Directory Server optimized for read and search operations
Horizontal scalability for consumer instances of Directory Server
The following figure shows a multi-master replication strategy that might be used to distribute directory access globally.
In multi-master replication, one or more instances of Directory Server manages the master directory database. Each master has a replication agreement that specifies procedures for synchronizing the master databases. Each master replicates to any number of consumer databases. As with single master replication, the consumer instances of Directory Server are optimized for read and search access. Any write operation received by a consumer is referred back to the master. The master periodically updates the consumer databases.
Multi-master replication strategy provides all the advantages of single master replication, plus an availability strategy that can provide load balancing for updates to the masters. You can also implement an availability strategy that provides local control of directory operations, which is an important consideration for enterprises with globally distributed data centers.