Scalability is the ability to add capacity to your system, usually by the addition of system resources, but without changes to the deployment architecture. During requirements analysis, you typically make projections of expected growth to a system based on the business requirements and subsequent usage analysis. These projections of the number of users of a system and the capacity of the system to meet their needs are often estimates that can vary significantly from the actual numbers for the deployed system. Your design should be flexible enough to allow for variance in your projections.
A design that is scalable includes sufficient latent capacity to handle increased loads until a system can be upgraded with additional resources. Scalable designs can be readily scaled to handle increasing loads without redesign of the system.
Latent capacity is one aspect of scalability where you include additional performance and availability resources into your system so the system can easily handle unusual peak loads. You can also monitor how latent capacity is used in a deployed system to help determine when to scale the system by adding resources. Latent capacity is one way to build safety into your design.
Analysis of use cases can help identify the scenarios that can create unusual peak loads. Use this analysis of unusual peak loads plus a factor to cover unexpected growth to design latent capacity that builds safety into your system.
Your system design should be able to handle projected capacity for a reasonable time, generally the first 6 to 12 months of operation. Maintenance cycles can be used to add resources or increase capacity as needed. Ideally, you should be able to schedule upgrades to the system on a regular basis, but predicting needed increases in capacity is often difficult. Rely on careful monitoring of your resources as well as business projections to determine when to upgrade a system.
If you plan to implement your solution in incremental phases, you might schedule increasing the capacity of the system to coincide with other improvements scheduled for each incremental phase.
The example in this section illustrates horizontal and vertical scaling for a solution that implements Messaging Server. For vertical scaling, you add additional CPUs to a server to handle increasing loads. For horizontal scaling, you handle increasing loads by adding additional servers for distribution of the load.
The baseline for the example assumes a 50,000 user base supported by two message store instances that are distributed for load balancing. Each server has two CPUs for a total of four CPUs. The following figure shows how this system can be scaled to handle increasing loads for 250,000 users and 2,000,000 users.
Scalability Example shows the differences between vertical scaling and horizontal scaling. This figure does not show other factors to consider when scaling, such as load balancing, failover, and changes in usage patterns.