Sun Java Enterprise System Deployment Planning Guide

Chapter 5 Deployment Design

During the deployment design phase of the solution life cycle, you design a high-level deployment architecture and a low-level implementation specification, and prepare a series of plans and specifications necessary to implement the solution. Project approval occurs in the deployment design phase.

This chapter contains the following sections:

About Deployment Design

Deployment design begins with the deployment scenario created during the logical design and technical requirements phases of the solution life cycle. The deployment scenario contains a logical architecture and the quality of service (QoS) requirements for the solution. You map the components identified in the logical architecture across physical servers and other network devices to create a deployment architecture. The QoS requirements provide guidance on hardware configurations for performance, availability, scalability, and other related QoS specifications.

Designing the deployment architecture is an iterative process. You typically revisit the QoS requirements and reexamine your preliminary designs. You take into account the interrelationship of the QoS requirements, balancing the trade-offs and cost of ownership issues to arrive at an optimal solution that ultimately satisfies the business goals of the project.

Project Approval

Project approval occurs during the deployment design phase, generally after you have created the deployment architecture. Using the deployment architecture and possibly also implementation specifications described below, the actual cost of the deployment is estimated and submitted to the stakeholders for approval. Once the project is approved, contracts for completion of the deployment are signed and resources to implement the project are acquired and allocated.

Deployment Design Outputs

During the deployment design phase, you might prepare any of the following specifications and plans:

Deployment architecture. A high-level architecture that depicts the mapping of a logical architecture to a physical environment. The physical environment includes the computing nodes in an intranet or Internet environment, processors, memory, storage devices, and other hardware and network devices.
Implementation specifications. Detailed specifications used as a blueprint for building the deployment. These specifications provide specifics on the computer and network hardware to acquire and describe the network layout for the deployment. Implementation specifications also include specifications for directory services, including details on a directory information tree (DIT) and the groups and roles defined for directory access.
Implementation plans. A group of plans that cover various aspects of implementing an enterprise software solution. Implementation plans include the following:
- Migration plan. Describes the strategies and processes for migrating enterprise data and upgrading enterprise software. The migrated data must conform to the formats and standards of the newly installed enterprise applications. All enterprise software must be at correct release version levels to interoperate.
- Installation plan. Derived from the deployment architecture, specifies hardware server names, installation directories, installation sequence, types of installation for each node, and the configuration information necessary to install and configure a distributed deployment.
- User management plan. Includes migration strategies for data in existing directories and databases, directory design specifications that takes into account replication design specified in the deployment architecture, and procedures for provisioning directories with new content.
- Test plan. Describes the procedures for testing the deployed software, including specific plans for developing prototype and pilot implementations, stress tests that determine the ability to handle projected loads, and functional tests that determine if planned functionality operates as expected.
- Roll-out plan. Describes the procedures and schedule for moving the implementation from a planning and test environment to a production environment. Moving an implementation into production usually occurs in various phases. For example, the first phase might be deploying the software for a limited group of users and increasing the user base with each phase until the entire deployment is complete. Phased implementation can also include scheduled implementation of specific software packages until the entire deployment is complete.
- Disaster recovery plan. Describes procedures on how to restore the system from unexpected system-wide failures. The recovery plan includes procedures for both large scale and small scale failures.
- Operations plan (Run Book). A manual of operations that describes monitoring, maintenance, installation, and upgrade procedures.
- Training plan. Contains processes and procedures for training operators, administrators, and end users on the newly installed enterprise software.

Factors Affecting Deployment Design

Several factors influence the decisions you make during deployment design. Consider the following key factors:

Logical Architecture. The logical architecture details the functional services in a proposed solution and the interrelationships of the components providing those services. Use the logical architecture as a key to determining the best way to distribute services. A deployment scenario contains the logical architecture paired with quality of service requirements (described below).
Quality of service requirements. The quality of service (QoS) requirements specify various aspects of a solution’s operation. Use the QoS requirements to help develop strategies to achieve performance, availability, scalability, serviceability, and other quality of service goals. A deployment scenario contains the logical architecture (described previously) paired with quality of service requirements.
Usage analysis. Usage analysis, developed during the technical requirements phase of the solution life cycle, provides information on usage patterns that can help estimate load and stress on a deployed system. Use the usage analysis to help isolate performance bottlenecks and develop strategies to satisfy QoS requirements.
Use cases. Use cases, developed during the technical requirements phase of the solution life cycle, lists distinct user interactions identified for a deployment, often identifying the most common use cases. Although the use cases are embodied in the usage analysis, when assessing a deployment design you should refer to the use cases to make sure that they are properly addressed.
Service level agreements. A service level agreement (SLA) specifies minimum performance requirements, and when those requirements are not met, the level and extent of customer support that must be provided. A deployment design should easily meet the performance requirements specified in a service level agreement.
Total cost of ownership. During deployment design you analyze potential solutions that address the QoS requirements for availability, performance, scalability, and others. However, for each solution you consider, you must also consider the cost of that solution and how that cost impacts the total cost of ownership. Make sure that you consider the trade-offs embodied by your decisions and that you have optimized your resources to achieve business requirements within business constraints.
Business goals. Business goals are stated during the business analysis phase of the solution life cycle and include the business requirements and business constraints to meet those goals. Deployment design is ultimately judged by its ability to satisfy the business goals.

Deployment Design Methodology

As with other aspects of deployment planning, deployment design is as much an art as it is a science and cannot be detailed with specific procedures and processes. Factors that contribute to successful deployment design are past design experience, knowledge of systems architecture, domain knowledge, and applied creative thinking.

Deployment design typically revolves around achieving performance requirements while meeting other QoS requirements. The strategies you use must balance the trade-offs of your design decisions to optimize the solution. The methodology you use typically involves the following tasks:

Estimating processor requirements. Deployment design often begins with estimating the number of CPUs needed for each component in the logical architecture. Start with the use cases representing the heaviest load and continue through each use case. Consider the load on all components providing support to the use cases, and modify your estimates accordingly. Also consider any previous experience you have with designing enterprise systems.
Estimating processor requirements for secure transport. Study the use cases that require secure transport and modify CPU estimates accordingly.
Replicating services for availability and scalability. Once you are satisfied with the processor estimates, make modifications to the design to account for QoS requirements for availability and scalability. Consider load balancing solutions that address availability and failover considerations.

During your analysis, consider the trade-offs of your design decisions. For example, what affect does the availability and scalability strategy have on serviceability (maintenance) of the system? What are the others costs of the strategies?
Identifying bottlenecks. As you continue with your analysis, examine the deployment design to identify any bottlenecks that cause the transmission of data to fall beneath requirements, and make adjustments.
Optimizing resources. Review your deployment design for resource management and consider options that minimizes costs while fulfilling requirements.
Managing risks. Revisit your business and technical analyses with respect to your design, making modifications to account for events or situations that might not have been foreseen in the earlier planning.

Estimating Processor Requirements

This section discusses a process for estimating the number of CPU processors and corresponding memory that are necessary to support the services in a deployment design. The section includes a walkthrough of an estimation process for an example communications deployment scenario.

The estimation of CPU computing power is an iterative process that considers the following:

Logical components and their interactions (as indicated by component dependencies in the logical architecture)
Usage analysis for the identified use cases
Quality of service requirements
Past experience with deployment design and with Java Enterprise System
Consultation with Sun professional services who have experience with designing and implementing various types of deployment scenarios

The estimation process includes the following steps. The ordering of these steps is not critical, but provides one way to consider the factors that affect the final result.

Determine a baseline CPU estimate for components identified as user entry points to the system.

One design decision is whether to fully load or partially load CPUs. Fully loaded CPUs maximize the capacity of a system. To increase the capacity, you incur the maintenance cost and possible downtime of adding additional CPUs. In some cases, you can choose to add additional machines to meet growing performance requirements.

Partially loaded CPUs allow room to handle excess performance requirements without immediately incurring maintenance costs. However, there is an additional up front expense of the under-utilized system.
Make adjustments to the CPU estimates to account for interactions between components.

Study the interactions among components in the logical architecture to determine the extra load required because of dependent components.
Study the usage analysis for specific use cases to determine peak loads for the system, and then make adjustments to components that handle the peak loads.

Start with the most heavily weighted use cases (those requiring the most load), and continue with each use case to make sure you account for all projected usage scenarios.
Make adjustments to the CPU estimates to reflect security, availability, and scalability requirements.

This estimation process provides starting points for determining the actual processing power you need. Typically, you create prototype deployments based on these estimates and then perform rigorous testing against expected use cases. Only after iterative testing can you determine the actual processing requirements for a deployment design.

Example Estimating Processor Requirements

This section illustrates one methodology to estimate processing power required for an example deployment. The example deployment is based on the logical architecture for the identity-based communications solution for a medium-sized enterprise of about 1,000 to 5,000 employees, as described in the section Identity-Based Communications Example.

The CPU and memory figures used in the example are arbitrary estimates for illustration only. These figures are based on arbitrary data upon which the theoretical example is based. An exhaustive analysis of various factors is necessary to estimate processor requirements. This analysis would include, but not be limited to, the following information:

Detailed use cases and usage analysis based on an exhaustive business analysis
Quality of service requirements determined by analysis of business requirements
Specific costs and specifications of processing and networking hardware
Past experience implementing similar deployments

Caution –

The information presented in these examples do not represent any specific implementation advice, other than to illustrate a process you might use when designing a system.

Determine Baseline CPU Estimate for User Entry Points

Begin by estimating the number of CPUs required to handle the expected load on each component that is a user entry point. The following figure shows the logical architecture for an identity-based communications scenario described previously in Identity-Based Communications Example.

Figure 5–1 Logical Architecture for Identity-Based Communications Scenario

Diagram showing logical components for an Identity-based
Communications scenario deployed in a multitiered architecture.

The following table lists the components in the presentation tier of the logical architecture that interface directly with end users of the deployment. The table includes baseline CPU estimates derived from analysis of technical requirements, use cases, specific usage analysis, and past experience with this type of deployment.

Table 5–1 CPU Estimates for Components Containing Access User Entry Points


Component	Number of CPUs	Description
Portal Server	4	Component that is a user entry point.
Communications Express	2	Routes data to Portal Server messaging and calendar channels.

Include CPU Estimates for Service Dependencies

The components providing user entry points require support from other Java Enterprise System components. As you continue to specify performance requirements, add the performance estimates required for supporting components. The type of interactions among components should be detailed when designing the logical architecture, as described in the logical architecture examples in the section Example Logical Architectures.

Table 5–2 CPU Estimates for Supporting Components


Component	CPUs	Description
Messaging Server MTA(inbound)	1	Routes incoming mail messages from Communications Express and e-mail clients.
Messaging Server MTA(outbound)	1	Routes outgoing mail messages to recipients.
Messaging Server MMP	1	Access Messaging Server message store for email clients.
Messaging Server STR(Message Store)	1	Retrieves and stores email messages.
Access Manager	2	Provides authorization and authentication services.
Calendar Server(back-end)	2	Retrieves and stores calendar data for Communications Express, a Calendar Server front-end.
Directory Server	2	Provides LDAP directory services.
Web Server	0	Provides web container support for Portal Serverand Access Manager. (No additional CPU cycles required.)

Study Use Cases for Peak Load Usage

Return to the use cases and usage analysis to identify areas of peak load usage and make adjustments to your CPU estimates.

For example, suppose for this example you identify the following peak load conditions:

Initial ramp up of users as they log on simultaneously
Email exchanges during specified time frames

To account for this peak load usage, make adjustment to the components providing these services. The following table outlines adjustments you might make to account for this peak load usage.

Table 5–3 CPU Estimate Adjustments for Peak Load


Component	CPUs (Adjusted)	Description
Messaging Server MTAinbound	2	Add 1 CPU for peak incoming email
Messaging Server MTAoutbound	2	Add 1 CPU for peak outgoing email
Messaging ServerMMP	2	Add 1 CPU for additional load
Messaging Server STR(Message Store)	2	Add 1 CPU for additional load
Directory Server	3	Add 1 CPU for additional LDAP lookups

Modify Estimates for Other Load Conditions

Continue with your CPU estimates to take into account other quality of service requirements that can impact load:

Security. From the technical requirements phase, determine how secure transport of data might affect the load requirements and make corresponding modifications to your estimates. The following section, Estimating Processor Requirements for Secure Transactions describes a process for making adjustments.
Replication of services. Adjust CPU estimates to account for replication of services for availability, load balancing, and scalability considerations. The following section, Determining Availability Strategies, discusses sizing for availability solutions. The section Determining Strategies for Scalability discusses solutions involving available access to directory services.
Latent capacity and scalability. Modify CPU estimates as necessary to allow latent capacity for unexpected large loads on the deployment. Look at the anticipated milestones for scaling and projected load increase over time to make sure you can reach any projected milestones to scale the system, either horizontally or vertically.

Update the CPU Estimates

Typically, you round up CPUs to an even number. Rounding up to an even number allows you to evenly split the CPU estimates between two physical servers and also adds a small factor for latent capacity. However, round up according to your specific needs for replication of services.

As a general rule, allow 2 gigabytes of memory for each CPU. The actual memory required depends on your specific usage and can be determined in testing.

The following table lists the final estimates for the identity-based communications example. These estimates do not include any additional computing power that could have been added for security and availability. Totals for security and availability will be added in following sections.

Table 5–4 CPU Estimate Adjustments for Supporting Components


Component	CPUs	Memory
Portal Server	4	8 GB
Communications Express	2	4 GB
Messaging Server(MTA, inbound)	2	4 GB
Messaging Server(MTA, outbound)	2	4 GB
Messaging Server(MMP)	2	4 GB
Messaging Server(Message Store)	2	4 GB
Access Manager	2	4 GB
Calendar Server	2	4 GB
Directory Server	4	8 GB (Rounded up from 3 CPUs/6 GB memory)
Web Server	0	0

Estimating Processor Requirements for Secure Transactions

Secure transport of data involves handling transactions over a secure transport protocol such as Secure Sockets Layer (SSL) or Transport Layer Security (TLS). Transactions handled over a secure transport typically require additional computing power to first, establish a secure session (known as the handshake) and then to encrypt and decrypt transported data. Depending on the encryption algorithm used (for example, 40-bit or 128-bit encryption algorithms), the additional computing power can be substantial.

For secure transactions to perform at the same level as nonsecure transactions, you must plan for additional computing power. Depending on the nature of the transaction and the Sun Java^TM Enterprise System services that handle it, secure transactions might require up to four times more computing power than nonsecure transactions.

When estimating the processing power to handle secure transactions, analyze use cases to determine the percentage of transactions that require secure transport. If the performance requirements for secure transactions are the same as for non-secure transactions, modify the CPU estimates to account for the additional computing power needed for the secure transactions.

In some usage scenarios, secure transport might only be required for authentication. Once a user is authenticated to the system, no additional security measures for transport of data is required. In other scenarios, secure transport might be required for all transactions.

For example, when browsing a product catalog for an online e-commerce site, all transactions can be nonsecure until the customer has finished making selections and is ready to “check out” to make a purchase. However, some usage scenarios, such as deployments for banks or brokerage houses, require most or all, transactions to be secure and apply the same performance standard for both secure and nonsecure transactions.

CPU Estimates for Secure Transactions

This section continues the example deployment to illustrate how to calculate CPU requirements for a theoretical use case that includes both secure and nonsecure transactions.

To estimate the CPU requirements for secure transactions, make the following calculations:

Start with a baseline figure for the CPU estimates (as illustrated in the previous section, Example Estimating Processor Requirements).
Calculate the percentage of transactions that require secure transport, and calculate the CPU estimates for the secure transactions.
Calculate reduced CPU estimates for non-secure transactions.
Tally the secure estimate and nonsecure estimate to calculate the total CPU estimates.
Round up the total CPU estimate to an even number.

CPU Estimates for Secure Transactions shows an example calculation based on use cases and usage analysis for the Portal Server that assume the following:

All logins require secure authentication.
All logins account for 10% of the total Portal Server load.
The performance requirement for secure transactions is the same as the performance requirement for non-secure transactions.

To account for the extra computing power to handle secure transactions, the number of CPUs to handle these transactions will be increased by a factor of four. As with other CPU figures in the example, this factor is arbitrary and is for illustration purposes only.

Table 5–5 Modifying CPU Estimates for Secure Transactions


Step	Description	Calculation	Result
1	Start with baseline estimate for all Portal Server transactions.	Baseline estimate from Study Use Cases for Peak Load Usage is 4 CPUs.	- - - - -
2	Calculate additional CPU estimates for secure transactions. Assume secure transactions require four times the CPU power as nonsecure transactions.	Ten percent of the baseline estimate require secure transport: `0.10 x 4 CPUs = 0.4 CPUs` Increase CPU power for secure transactions by a factor of four: `4 x 0.4 = 1.6 CPUs`	1.6 CPUs
3	Calculate reduced CPU estimates for nonsecure transactions.	Ninety percent of the baseline estimate are non-secure: `0.9 x 4 CPUs = 3.6 CPUs`	3.6 CPUs
4	Calculate adjusted total CPU estimates for secure and nonsecure transactions.	Secure estimate + non-secure estimate = total: `1.6 CPUs + 3.6 CPUs = 5.2 CPUs`	5.2 CPUs
5	Round up to even number.	`5.2 CPUs ==> 6 CPUs`	6 CPUs

From the calculations for secure transactions in this example, you would modify the total CPU estimates in CPU Estimates for Secure Transactions by adding an additional two CPUs and four gigabytes of memory to get the following total for Portal Server.

Component	CPUs	Memory
Portal Server	6	12 GB

Specialized Hardware to Handle SSL Transactions

Specialized hardware devices, such as SSL accelerator cards and other appliances, are available to provide computing power to handle establishment of secure sessions and the encryption and decryption of data. When using specialized hardware for SSL operations, computational power is dedicated to some part of the SSL computations, typically the “handshake” operation that establishes a secure session.

This hardware might be of benefit to your final deployment architecture. However, because of the specialized nature of the hardware, estimate secure transaction performance requirements first in terms of CPU power, and then consider the benefits of using specialized hardware to handle the additional load.

Some factors to consider when using specialized hardware are whether the use cases support using the hardware (for example, use cases that require a large number of SSL handshake operations) and the added layer of complexity this type of hardware brings to the design. This complexity includes the installation, configuration, testing, and administration of these devices.

Determining Availability Strategies

When developing a strategy for availability requirements, study the component interactions and usage analysis to determine which availability solutions to consider. Do your analysis on a component-by-component basis, determining a best-fit solution for availability and failover requirements.

The following items are examples of the type of information you gather to help determine availability strategies:

How many nines of availability are specified?
What are the performance specifications with respect to failover situations (for example, at least 50% of performance during failover)?
Does the usage analysis identify times of peak and non-peak usage?
What are the geographical considerations?

The availability strategy you choose must also take into consideration serviceability requirements, as discussed in Designing for Optimum Resource Usage. Avoid complex solutions that require considerable administration and maintenance.

Availability Strategies

Availability strategies for Java Enterprise System deployments include the following:

Load balancing. Uses redundant hardware and software components to share a processing load. A load balancer directs any requests for a service to one of multiple symmetric instances of the service. If any one instance should fail, other instances are available to assume a heavier load.
Failover. Involves managing redundant hardware and software to provide continuous access of services and security for critical data if any component fails.

Sun Cluster software provides a failover solution for critical data managed by back-end components such as the message storage for Messaging Server and calendar data for Calendar Server.
Replication of services. Replication of services provides multiple sources for access to the same data. Directory Server provides numerous replication and synchronization strategies for LDAP directory access.

The following sections provide some examples of availability solutions that provide various levels of load balancing, failover, and replication of services.

Single Server System

Place all computing resources for a service on a single server. If the server fails, the entire service fails.

Figure 5–2 Single Server System

Shows a single server with 10 CPUs satisfying the performance
requirement.

Sun provides high-end servers that provide the following benefits:

Replacement and reconfiguration of hardware components while the system is running
Ability to run multiple applications in fault-isolated domains on the server
Ability to upgrade capacity, performance speed, and I/O configuration without rebooting the system

A high-end server typically costs more than a comparable multi-server system. However, a single server provides savings on administration, monitoring, and hosting costs for servers in a data center. Load balancing, failover, and removal of single points of failure is more flexible with multi-server systems.

Horizontally Redundant Systems

There are several ways to increase availability with parallel redundant servers that provide both load balancing and failover. The following figure illustrates two replicate servers providing an N+1 failover system. An N+1 system has an additional server to provide 100% capacity should one server fail.

Figure 5–3 N+1 Failover System With Two Servers

Shows two replicate servers with 10 CPUs each to satisfy
the 10 CPU performance requirement.

The computing power of each server in Horizontally Redundant Systems above is identical. One server alone handles the performance requirements. The other server provides 100% of the performance when called into service as a backup.

The advantage of an N+1 failover design is 100% performance during a failover situation. Disadvantages include increased hardware costs with no corresponding gain in overall performance (because one server is a standby for use in failover situations only).

The following figure illustrates a system that implements load balancing plus failover that distributes the performance between two servers.

Figure 5–4 Load Balancing Plus Failover Between Two Servers

Shows two servers with 6 CPUs each to satisfy the 10
CPU performance requirement.

In the system depicted in Horizontally Redundant Systems above, if one server fails, all services are available, although at a percentage of the full capacity. The remaining server provides 6 CPUs of computing power, which is 60% of the 10 CPU requirement.

An advantage of this design is the additional 2 CPU latent capacity when both servers are available.

The following figure illustrates a distribution between a number of servers for performance and load balancing.

Figure 5–5 Distribution of Load Between n Servers

Shows five servers with 2 CPUs each to satisfy the 10
CPU performance requirement.

Because there are five servers in the design depicted in Horizontally Redundant Systems, if one server fails the remaining servers provide a total of 8 CPUs of computing power, which is 80% of the 10 CPU performance requirement. If you add an additional server with a 2-CPU capacity to the design, you effectively have an N+1 design. If one server fails, 100% of the performance requirement is met by the remaining servers.

This design includes the following advantages:

Added performance if a single server fails
Availability even when more than one server is down
Servers can be rotated out of service for maintenance and upgrades
Multiple low-end servers typically cost less than a single high-end server

However, administration and maintenance costs can increase significantly with additional servers. You also have to consider costs for hosting the servers in a data center. At some point you run into diminishing returns by adding additional servers.

Sun Cluster Software

For situations that require a high degree of availability (such as four or five nines), you might consider Sun Cluster software as part of your availability design. A cluster system is the coupling of redundant servers with storage and other network resources. The servers in a cluster continually communicate with each other. If one of the servers goes offline, the remainder of the devices in the cluster isolate the server and fail over any application or data from the failing node to another node. This failover process is achieved relatively quickly with little interruption of service to the users of the system.

Sun Cluster software requires additional dedicated hardware and specialized skills to configure, administer, and maintain.

Availability Design Examples

This section contains two examples of availability strategies based on the identity-based communications solution for a medium-sized enterprise of about 1,000 to 5,000 employees, as described previously in Identity-Based Communications Example. The first availability strategy illustrates load balancing for Messaging Server. The second illustrates a failover solution that uses Sun Cluster software.

Load Balancing Example for Messaging Server

The following table lists the estimates for CPU power for each logical Messaging Server component in the logical architecture. This table repeats the final estimation calculated in the section Update the CPU Estimates .

Table 5–6 CPU Estimate Adjustments for Supporting Components


Component	CPUs	Memory
Messaging Server(MTA, inbound)	2	4 GB
Messaging Server(MTA, outbound)	2	4 GB
Messaging Server(MMP)	2	4 GB
Messaging Server(Message Store)	2	4 GB

For this example, assume that during technical requirements phase, the following quality of service requirements were specified:

Availability. Overall system availability should be 99.99% (does not include scheduled downtime). Failure of an individual computer system should not result in service failure.
Scalability. No server should be more than 80% utilized under daily peak load and the system must accommodate long-term growth of 10% per year.

To fulfill the availability requirement, for each Messaging Server component provide two instances, one of each on separate hardware servers. If a server for one component fails, the other provides the service. The following figure illustrates the network diagram for this availability strategy.

Architecture diagram showing availability for Messaging
Server MMP, and MTA components.

In the preceding figure the number of CPUs has doubled from the original estimate. The CPUs are doubled for the following reasons:

In the event one server fails, the remaining server provides the CPU power to handle the load.
For the scalability requirement that no single server is more than 80% utilized under peak load, the added CPU power provides this safety margin.
For the scalability requirement to accommodate 10% increased load per year, the added CPU power adds latent capacity that can handle increasing loads until additional scaling would be needed.

Failover Example Using Sun Cluster Software

The following figure shows an example of failover strategy for Calendar Server back-end and Messaging Server messaging store. The Calendar Server back-end and messaging store are replicated on separate hardware servers and configured for failover with Sun Cluster software. The number of CPUs and corresponding memory are replicated on each server in the Sun Cluster.

Figure 5–6 Failover Design Using Sun Cluster Software

Architecture diagram showing Calendar Server and Message
Server storage deployed with Sun Cluster software for failover.

Replication of Directory Services Example

Directory services can be replicated to distribute transactions across different servers, providing high availability. Directory Server provides various strategies for replication of services, including the following:

Multiple databases. Stores different portions of a directory tree in separate databases.
Chaining and referrals. Links distributed data into a single directory tree.
Single master replication. Provides a central source for the master database, which is then distributed to consumer replicas.
Multi-master replication. Distributes the master database among several servers. Each of these masters then distributes their database among consumer replicas.

Availability strategies for Directory Server is a complex topic that is beyond the scope of this guide. The following sections, Single Master Replication and Multi-Master Replication provide a high-level view of basic replication strategies. For detailed information see Chapter 12, Designing a Highly Available Deployment, in Sun Java System Directory Server Enterprise Edition 6.0 Deployment Planning Guide.

Single Master Replication

The following figure shows a single master replication strategy that illustrates basic replication concepts.

Figure 5–7 Single Master Replication Example

Diagram showing the flow of data for a single master
replication strategy.

In single master replication, one instance of Directory Server manages the master directory database, logging all changes. The master database is replicated to any number of consumer databases. The consumer instances of Directory Server are optimized for read and search operations. Any write operation received by a consumer is referred back to the master. The master periodically updates the consumer databases.

Advantages of single master replication include:

Single instance of Directory Server optimized for database read and write operations
Any number of consumer instances of Directory Server optimized for read and search operations
Horizontal scalability for consumer instances of Directory Server

Multi-Master Replication

The following figure shows a multi-master replication strategy that might be used to distribute directory access globally.

In multi-master replication, one or more instances of Directory Server manages the master directory database. Each master has a replication agreement that specifies procedures for synchronizing the master databases. Each master replicates to any number of consumer databases. As with single master replication, the consumer instances of Directory Server are optimized for read and search access. Any write operation received by a consumer is referred back to the master. The master periodically updates the consumer databases.

Figure 5–8 Multi-master Replication Example

Diagram showing the flow of data for a multi-master replication
strategy.

Multi-master replication strategy provides all the advantages of single master replication, plus an availability strategy that can provide load balancing for updates to the masters. You can also implement an availability strategy that provides local control of directory operations, which is an important consideration for enterprises with globally distributed data centers.

Determining Strategies for Scalability

Scalability is the ability to add capacity to your system, usually by the addition of system resources, but without changes to the deployment architecture. During requirements analysis, you typically make projections of expected growth to a system based on the business requirements and subsequent usage analysis. These projections of the number of users of a system and the capacity of the system to meet their needs are often estimates that can vary significantly from the actual numbers for the deployed system. Your design should be flexible enough to allow for variance in your projections.

A design that is scalable includes sufficient latent capacity to handle increased loads until a system can be upgraded with additional resources. Scalable designs can be readily scaled to handle increasing loads without redesign of the system.

Latent Capacity

Latent capacity is one aspect of scalability where you include additional performance and availability resources into your system so the system can easily handle unusual peak loads. You can also monitor how latent capacity is used in a deployed system to help determine when to scale the system by adding resources. Latent capacity is one way to build safety into your design.

Analysis of use cases can help identify the scenarios that can create unusual peak loads. Use this analysis of unusual peak loads plus a factor to cover unexpected growth to design latent capacity that builds safety into your system.

Your system design should be able to handle projected capacity for a reasonable time, generally the first 6 to 12 months of operation. Maintenance cycles can be used to add resources or increase capacity as needed. Ideally, you should be able to schedule upgrades to the system on a regular basis, but predicting needed increases in capacity is often difficult. Rely on careful monitoring of your resources as well as business projections to determine when to upgrade a system.

If you plan to implement your solution in incremental phases, you might schedule increasing the capacity of the system to coincide with other improvements scheduled for each incremental phase.

Scalability Example

The example in this section illustrates horizontal and vertical scaling for a solution that implements Messaging Server. For vertical scaling, you add additional CPUs to a server to handle increasing loads. For horizontal scaling, you handle increasing loads by adding additional servers for distribution of the load.

The baseline for the example assumes a 50,000 user base supported by two message store instances that are distributed for load balancing. Each server has two CPUs for a total of four CPUs. The following figure shows how this system can be scaled to handle increasing loads for 250,000 users and 2,000,000 users.

Note –

Scalability Example shows the differences between vertical scaling and horizontal scaling. This figure does not show other factors to consider when scaling, such as load balancing, failover, and changes in usage patterns.

Figure 5–9 Horizontal and Vertical Scaling Examples

Architecture diagrams showing vertical and horizontal
scaling compared to a baseline architecture.

Identifying Performance Bottlenecks

One of the keys to successful deployment design is identifying potential performance bottlenecks and developing a strategy to avoid them. A performance bottleneck occurs when the rate at which data is accessed cannot meet specified system requirements.

Bottlenecks can be categorized according to various classes of hardware, as listed in the following table of data access points within a system. This table also suggests potential remedies for bottlenecks in each hardware class.

Table 5–7 Data Access Points


Hardware Class	Relative Access Speed	Remedies for Performance Improvement
Processor	Nanoseconds	Vertical scaling: Add more processing power, improve processor cache Horizontal scaling: Add parallel processing power for load balancing
System memory (RAM)	Microseconds	Dedicate system memory to specific tasks Vertical scaling: Add additional memory Horizontal scaling: Create additional instances for parallel processing and load balancing
Disk read and write	Milliseconds	Optimize disk access with disk arrays (RAID) Dedicate disk access to specific functions, such as read only or write only Cache frequently accessed data in system memory
Network interface	Varies depending on bandwidth and access speed of nodes on the network	Increase bandwidth Add accelerator hardware when transporting secure data Improve performance on nodes within the network so the data is more readily available

Note –

Identifying Performance Bottlenecks lists hardware classes according to relative access speed, implying that slow access points, such as disks, are more likely to be the source of bottlenecks. However, processors that are underpowered to handle large loads are also likely sources of bottlenecks.

You typically begin deployment design with baseline processing power estimates for each component in the deployment and their dependencies. You then determine how to avoid bottlenecks related to system memory and disk access. Finally, you examine the network interface to determine potential bottlenecks and focus on strategies to overcome them.

Optimizing Disk Access

A critical component of deployment design is the speed of disk access to frequently accessed datasets, such as LDAP directories. Disk access provides the slowest access to data and is a likely source of a performance bottleneck.

One way to optimize disk access is to separate write operations from read operations. Not only are write operations more expensive than read operations, read operations (lookup operations for LDAP directories) typically occur with considerably more frequency than write operations (updates to data in LDAP directories).

Another way to optimize disk access is by dedicating disks to different types of I/O operations. For example, provide separate disk access for Directory Server logging operations, such as transaction logs and event logs, and LDAP read and write operations.

Also, consider implementing one or more instances of Directory Server dedicated to read and write operations and using replicated instances distributed to local servers for red and search access. Chaining and linking options are also available to optimize access to directory services.

Chapter 6, Tuning System Characteristics and Hardware Sizing, in Sun Java System Directory Server Enterprise Edition 6.0 Deployment Planning Guide discusses various factors in planning for disk access. Topics in this chapter include:

Minimum memory and disk space requirements. Provides estimates for disk and memory needed for various sizes of directories.
Sizing physical memory for cache access. Provides guidance on estimating cache size according to planned usage of Directory Server and on planning total memory usage.
Sizing disk subsystems. Provides information on planning disk space requirements according to directory suffixes and Directory Server factors that affect disk use. and distributing files across disks, including various disk array alternatives.

Designing for Optimum Resource Usage

Deployment design is not just estimating the resources required to meet the QoS requirements. During deployment design you also analyze all available options and select the best solution that minimizes cost but still fulfills QoS requirements. You must analyze the trade-off for each design decision to make sure a benefit in one area is not offset by a cost in another.

For example, horizontal scaling for availability might increase overall availability, but at the cost of increased maintenance and service. Vertical scaling for performance might increase computing power inexpensively, but the additional power might be used inefficiently by some services.

Before completing your design strategy, examine your decisions to make sure that you have balanced the use of resources with the overall benefit to the proposed solution. This analysis typically involves examining how system qualities in one area affect other system qualities. The following table lists some system qualities and corresponding considerations for resource management.

Table 5–8 Resource Management Considerations


System Quality	Description
Performance	For performance solutions that concentrate CPUs on individual servers, will the services be able to efficiently use the computing power? (For example, some services have a ceiling on the number of CPUs that can be efficiently used.)
Latent capacity	Does your strategy handle loads that exceed performance estimates? Are excessive loads handled with vertical scaling on servers, load balancing to other servers, or both? Is the latent capacity sufficient to handle unusual peak loads until you reach the next milestone for scaling the deployment?
Security	Have you sufficiently accounted for the performance overhead required to handle secure transactions?
Availability	For horizontally redundant solutions, have you sufficiently estimated long-term maintenance expenses? Have you accounted for the scheduled downtime necessary to maintain the system? Have you balanced the costs between high-end servers and low-end servers?
Scalability	Have you estimated milestones for scaling the deployment? Do you have a strategy to provide enough latent capacity to handle projected increases in load until you reach the milestones for scaling the deployment?
Serviceability	Have you taken into account administration, monitoring, and maintenance costs into your availability design? Have you considered delegated administration solutions (allowing end-users to perform some administration tasks) to reduce administration costs?

Managing Risks

Much of the information on which deployment design is based, such as quality of service requirements and usage analysis, is not empirical data but data based on estimates and projections ultimately derived from business analyses. These projections could be inaccurate for may reasons, including unforeseen circumstances in the business climate, faulty methods of gathering data, or simply human error. Before completing a deployment design, revisit the analyses upon which your design is based and make sure your design accounts for any reasonable deviations from the estimates or projections.

For example, if the usage analysis underestimates the actual usage of the system, you run the risk of building a system that cannot cope with the amount of traffic it encounters. A design that under performs will surely be considered a failure.

On the other hand, if you build a system that is several orders more powerful than required, you divert resources that could be used elsewhere. The key is to include a margin of safety above the requirements, but to avoid extravagant use of resources.

Extravagant use of resources results in a failure of the design because underutilized resources could have been applied to other areas. Additionally, extravagant solutions might be perceived by stakeholders as not fulfilling contracts in good faith.

Example Deployment Architecture

The following figure represents a completed deployment architecture for the example deployment introduced earlier in this white paper. This figure provides an idea of how to present a deployment architecture.

Caution –

The deployment architecture in the following figure is for illustration purposes only. It does not represent a deployment that has been actually designed, built, or tested and should not be considered as deployment planning advice.

Figure 5–10 Example Deployment Architecture

This figure shows an example of a layout for a deployment
architecture.