Sun Java Enterprise System Deployment Planning Guide

Chapter 3 Technical Requirements

During the technical requirements phase of the solution life cycle you perform a usage analysis, identify use cases, and determine quality of service requirements for the proposed deployment solution.

This chapter contains the following sections:

About Technical Requirements

Technical requirements analysis begins with the business requirements documents created during the business analysis phase of the solution life cycle. Using the business analysis as a basis, you do the following:

Perform a usage analysis to aid in determining expected load conditions.
Create use cases that model typical user interaction with the system.
Create a set of quality of service requirements (QoS) that define how a deployed solution must perform in areas such as response time, availability, security, and others.

The quality of service requirements are derived from the usage analysis and the use cases, keeping in mind business requirements and constraints previously identified.

The quality of service requirements are later paired with logical architectures in the logical design phase to form a deployment scenario. The deployment scenario is the main input to the deployment design phase of the solution life cycle.

As with business analysis, no simple formula for technical requirements analysis exists that generates the usage analysis, use cases, and system requirements. Technical requirements analysis requires an understanding of the business domain, business objectives, and the underlying system technology.

Usage Analysis

Usage analysis involves identifying the various users of the solution you are designing and determining the usage patterns for those users. The information you gather provides a basis for estimating the load conditions on the system. Usage analysis information is also useful when assigning weights to use cases, as described in Use Cases.

During usage analysis, you should interview users whenever possible, research existing data on usage patterns, and interview builders and administrators of previous systems. The following table lists factors to consider when performing a usage analysis.

Table 3–1 Usage Analysis Factors


Topic	Description
Number and type of users	Identify how many users your solution must support, and categorize those users, if necessary. For example: A Business to Customer (B2C) solution might have a large number of visitors, but only a small number of users who register and engage in business transactions. A Business to Employee (B2E) solution typically accommodates each employee, although some employees might need access from outside the corporate network. In a B2E solution, managers might need authorization to areas that regular employees cannot access.
Active and inactive users	Identify the usage patterns and ratios of active and inactive users. Active users are those users logged into the system and interact with the system’s services. Inactive users can be users who are not logged in, users who log in but do not interact with the system’s components, or users who are in the database but never log in.
Administrative users	Identify users that access the deployed system to monitor, update, and support the deployment. Determine any specific administrative usage patterns that might affect technical requirements (for example, administration of the deployment from outside the firewall).
Usage patterns	Identify how various types of users access the system and provide targets for expected usage. For example: Are there peak times when usage spikes? What are normal business hours? Are users distributed globally? What is the expected duration of user connectivity?
User growth	Determine if the size of the user base is fixed or if the deployment expects growth in the number of users. If the user base is expected to grow, try to create reasonable projections of the growth.
User transactions	Identify the type of user transactions that must be supported. These user transactions can be translated into use cases. For example: What tasks do users perform? When users log in, do they remain logged in? Do they typically perform a few tasks and log out? Will significant collaboration between users require common calendars, web-conferences, and deployment of internal web pages?
User studies and statistical data	Use pre-existing user studies and other sources to determine patterns of user behavior. Often, enterprises or industry organizations have user research studies from which you can extract useful information about users. Log files for existing applications might contain statistical data useful in making estimates for a system.

Use Cases

Use cases model typical user interaction with the solution that you are designing, and describe the complete flow of an operation from the perspective of an end user. Prioritizing the design around a complete set of use cases ensures a continual focus on the delivery of expected functionality. Use cases are the principal input to logical design.

Assign relative weights to use cases, with the highest weighted use cases representing the most common user tasks. The weighting of use cases allows you to focus your design decisions on the system services that are used the most.

Use cases can be described at two levels.

Use-case reports. Descriptions of individual use cases, including primary and alternative flows of events.
Use-case diagrams. Diagrams depicting the relationships among actors and the use cases, presenting a more formal organization of the flow of events. Use-case diagrams are useful to model long or complex use cases. Typically, you use Unified Modeling Language (UML) standards to draw use case diagrams.

Quality of Service Requirements

Quality of service (QoS) requirements are technical specifications that specify the system quality of features such as performance, availability, scalability, and serviceability. QoS requirements are driven by business needs specified in the business requirements. For example, if services must be available 24 hours a day throughout the year, the availability requirement must address this business requirement.

The following table lists the system qualities that typically form a basis for QoS requirements.

Table 3–2 System Qualities Affecting QoS Requirements


System Quality	Description
Performance	The measurement of response time and throughput with respect to user load conditions.
Availability	A measure of how often a system’s resources and services are accessible to end users, often expressed as the uptime of a system.
Scalability	The ability to add capacity (and users) to a deployed system over time. Scalability typically involves adding resources to the system but should not require changes to the deployment architecture.
Security	A complex combination of factors that describe the integrity of a system and its users. Security includes authentication and authorization of users, security of data, and secure access to a deployed system.
Latent capacity	The ability of a system to handle unusual peak loads without additional resources. Latent capacity is a factor in availability, performance, and scalability qualities.
Serviceability	The ease by which a deployed system can be maintained, including monitoring the system, repairing problems that arise, and upgrading hardware and software components.

System qualities are closely interrelated. Requirements for one system quality might affect the requirements and design for other system qualities. For example, higher levels of security might affect performance, which in turn might affect availability. Adding additional servers to address availability issues affect serviceability (maintenance costs).

Understanding how system qualities are interrelated and the trade-offs that must be made is the key to designing a system that successfully satisfies both business requirements and business constraints.

The following sections describe further the system qualities that impact deployment design, providing guidance on factors to consider when formulating QoS requirements. A section on service level requirements, which form the basis of service level agreements, is also included.

Performance

Business requirements typically express performance in nontechnical terms that specify response time. For example, a business requirement for web-based access might state the following:

Users expect a reasonable response time upon login, typically no greater than four seconds.

Starting with this business requirement, examine all use cases to determine how to express this requirement at a system level. In some cases, you might want to include user load conditions determined during usage analysis. Express the performance requirement for each use case in terms of response time under specified load conditions or response time plus throughput. You might also specify the allowable number of errors.

Here are two examples of how to specify system requirements for performance:

Response for web page refresh must be no greater than four seconds throughout the day, sampled at 15-minute intervals, with fewer than 3.4 errors per million transactions.
During defined peak periods, the system must allow 25 secure logins per second with response time no greater than 12 seconds for any user and with fewer than 3.4 errors per million transactions.

Performance requirements are closely related to availability requirements (how failover impacts performance) and latent capacity (how much capacity is available to handle unusual peak loads).

Availability

Availability is a way to specify the uptime of a system and is typically measured as the percentage of time that the system is accessible to users. The time that the system is not accessible (downtime) can be due to the failure of hardware, software, the network, or any other factor (such as loss of power) that causes the system to be down. Scheduled downtime for service (maintenance and upgrades) is not considered downtime. A basic equation to calculate system availability in terms of percentage of uptime is:

Availability = uptime / (uptime + downtime) * 100%

Typically you measure availability by the number of “nines” you can achieve. For example, 99% availability is two nines. Specifying additional nines significantly affects the deployment design. The following table quantifies the unscheduled downtime for additional nines of availability to a system that is running 24x7 year-round (a total of 8,760 hours).

Table 3–3 Unscheduled Downtime for a System Running Year-Round (8,760 hours)


Number of Nines	Percentage Available	Unscheduled Downtime
2	99%	88 hours
3	99.9%	9 hours
4	99.99%	45 minutes
5	99.999%	5 minutes

Fault-Tolerant Systems

Availability requirements of four or five nines typically require a system that is fault-tolerant. A fault-tolerant system must be able to continue service even during a hardware or software failure. Typically, fault tolerance is achieved by redundancy in both hardware (such as CPUs, memory, and network devices) and in software providing key services.

A single point of failure is a hardware or software component that is part of a critical path but is not backed up by redundant components. The failure of this component results in the loss of service for the system. When designing a fault-tolerant system, you must identify and eliminate potential single points of failure.

Fault-tolerant systems can be expensive to implement and maintain. Make sure you understand the nature of the business requirements for availability and consider the strategies and costs of availability solutions that meet those requirements.

Prioritizing Service Availability

From a user perspective, availability often applies more on a service-by-service basis rather than on the availability of the entire system. For example, the unavailability of instant messaging services usually has little or no impact on the availability of other services. However, the unavailability of services upon which many other services depend (such as Directory Server) has a much wider impact. Higher availability specifications should clearly reference specific use cases and usage analysis that require the increased availability.

It is helpful to list availability needs according to an ordered set of priorities. The following table prioritizes the availability of different types of services.

Table 3–4 Availability of Services by Priority


Priority	Service Type	Description
1	Mission critical	Services that must be available at all times. For example, database services (such as LDAP directories) to applications.
2	Must be available	Services that must be available, but can be available at reduced performance. For example, messaging service availability might not be critical in some business environments.
3	Can be postponed	Services that must be available within a given time period. For example, calendar services availability might not be essential in some business environments.
4	Optional	Services that can be postponed indefinitely. For example, in some environments instant messaging services can be considered useful but not necessary.

Loss of Services

Availability design includes consideration for what happens when availability is compromised or when a component is lost. This includes considering whether users connected must restart sessions and how a failure in one area affects other areas of a system. QoS requirements should consider these scenarios and specify how the deployment reacts to these situations.

Scalability

Scalability is the ability to add capacity to a system so the system can support additional load from existing users or from an increased user-base. Scalability usually requires the addition of resources, but should not require changes in the design of the deployment architecture or loss of service due to the time required to add additional resources.

As with availability, scalability applies more to individual services provided by a system rather than to the entire system. However, for services upon which other services depend, such as Directory Server, scalability can have system-wide impact.

You do not necessarily specify scalability requirements with QoS requirements unless projected growth of the deployment is clearly stated in the business requirements. However, during the deployment design phase of the solution life cycle, the deployment architecture should always add some tolerance for scaling the system even if no QoS requirements for scalability have been specified.

Estimating Growth

Estimating the growth of a system to determine scalability requirements involves working with projections, estimates, and guesses that might not be fulfilled. Three keys to developing requirements for a scalable system are the following.

High performance design strategy. During the specification of performance requirements, include latent capacity to handle loads that might increase over time. Also, maximize availability within budget constraints. This strategy allows you to absorb growth and better schedule milestones for scaling the system.
Incremental deployment. Incremental deployment helps with scheduling the addition of resources. Specify clear milestones for scaling the system. Milestones are typically load-based requirements coordinated with specific dates for assessing scalability.
Extensive performance monitoring. Monitoring performance helps determine when to add resources to the system. Requirements for monitoring performance can provide guidance to operators and administrators responsible for maintenance and upgrades.

The following table lists factors to consider for determining scalability requirements.

Table 3–5 Scalability Factors


Topic	Description
Analyze usage patterns	Understand the usage patterns of the current (or projected) user base by studying existing data. In the absence of current data, analyze industry data or market estimates.
Design for reasonable maximum scale	Design with a goal towards the maximum required scale for both known demand and possible demand. Often, this is a 24-month estimate based on performance evaluation of the existing user load and reasonable expectations of future load. The time period for the estimate depends largely on the reliability of projections.
Set appropriate milestones	Implement the deployment design in increments to meet short-term requirements with a buffer to allow for unexpected growth. Set milestones for adding system resources. For example: Capital acquisition (such as quarterly or yearly) Lead time to purchase hardware and software (such as one to six weeks) Buffer (10% to 100%, depending on growth expectations)
Incorporate emerging technology	Understand emerging technology, such as faster processors and Web servers, and how this technology can affect the performance of the underlying architecture.

Security Requirements

Security is a complex topic that involves all levels of a deployed system. Developing security requirements revolves around identifying the security threats and developing a strategy to combat them. This security analysis includes the following steps:

Identifying critical assets
Identifying threats to those assets
Identifying vulnerabilities that expose the threats that create risk to the organization
Developing a security plan that mitigates the risk to the organization

The analysis of security requirements should involve a cross-section of stakeholders from your organization, including managers, business analysts, and information technology personnel. Often, an organization appoints a security architect to take the lead in the design and implementation of security measures.

The following section describes some of the areas that are covered in security planning.

Elements of a Security Plan

Planning for security of a system is part of deployment design that is essential to successful implementation. Consider the following when planning for security:

Physical security. Physical security is the physical access to routers, servers, server rooms, data centers, and other parts of your infrastructure. Other security measures become compromised if an unauthorized person can walk into a server room and unplug routers.
Network security. Network security is access to your network through firewalls, secure access zones, access control lists, and port access. For network security you develop strategies for unauthorized access, tampering, and denial of service (DoS) attacks.
Application and application data security. Application and application data security covers access to user accounts, corporate data, and enterprise applications through authentication and authorization procedures and policies. This area includes defining the following policies:
- Password policies
- Access rights, such as delegated administration to users as opposed to administrator access
- Account inactivation
- Access control
- Encryption policies, including secure transport of data and using certificates to sign data
Personal security practices. An organization-wide security policy defines the working environment and practices with which all users must comply to ensure other security measures perform as designed. Typically, you develop a handbook or manual on security and also offer training to users on security practices. For an effective overall security policy, sound security practices must become part of the organization culture.

Latent Capacity

Latent capacity is the ability of a deployment to handle unusual peak load usage without the addition of resources. Typically, you do not specify QoS requirements directly around latent capacity, but this system quality is a factor in the availability, performance, and scalability of the system.

Serviceability Requirements

Serviceability is the ease with which a deployed system can be maintained, including tasks such as monitoring the system, repairing problems that arise, adding and removing users from the system, and upgrading hardware and software components.

When planning requirements for serviceability, consider the topics listed in the following table.

Table 3–6 Topics for Serviceability Requirements


Topic	Description
Downtime planning	Identify maintenance tasks that require specific services to be unavailable or partially unavailable. Some maintenance and upgrades can occur seamlessly to users, while others require interruption of service. When possible, schedule with users those maintenance activities that require downtime, allowing the users to plan for the downtime.
Usage patterns	Identify the usage patterns to determine the best time to schedule maintenance. For example, on systems where peak usage is during normal business hours, schedule maintenance in the evening or weekends. For geographically distributed systems, identifying these times can be more challenging.
Availability	Serviceability is often a reflection of your availability design. Strategies for minimizing downtime for maintenance and upgrades revolve around your availability strategy. Systems that require a high degree of availability have limited opportunities for maintenance, upgrades, and repair. Strategies for handling availability requirements affect how you handle maintenance and upgrades. For example, on systems that are distributed geographically, servicing can depend on the ability to route workloads to remote servers during maintenance periods. Also, systems requiring a high degree of availability might require more sophisticated solutions that automate restarting of systems with little human intervention.
Diagnostics and monitoring	You can improve the stability of a system by regularly running diagnostic and monitoring tools to identify problem areas. Regular monitoring of a system can avoid problems before they occur, help balance workloads according to availability strategies, and improve planning for maintenance and downtime.

Service Level Requirements

A service level agreement (SLA) specifies minimum performance requirements and, upon failure to meet those requirements, the level and extent of customer support that must be provided. Service level requirements are system requirements that specify the conditions upon which the SLA is based.

As with QoS requirements, service level requirements derive from business requirements and represent a guarantee about the overall system quality that the deployed system must meet. Because the service level agreement is considered to be a contract, specification of service level requirements should be unambiguous. The service level requirements define exactly under what conditions the requirements are tested and precisely what constitutes failure to meet the requirements.