Scalability is the ability of your workload to meet business demands in near real-time without disrupting quality of service. Additional capacity has traditionally been assigned in advance to accommodate predictable spikes in traffic, while unexpected spikes in traffic are monitored and alerts are published to increase capacity, in a reactive approach.

Upscaling and downscaling require manual intervention and are subject to variable calculations for the amount of capacity change and duration to maintain. It becomes more tedious to manage the requirements when spikes are frequent and unpredictable.

These types of scenarios have typically been managed with permanent upscale capacity for workload, which helps to accommodate load, but incurs higher costs for running workloads long term. Unlike on-premises where hardware capacity is procured regardless of use, managing this type of capacity in the cloud doesn't require permanent investment in infrastructure, and there's no hardware refresh required to stay up to date.

Workload can raise demand capacity in the CPU, memory, storage, and input/output (I/O), where CPU and memory are primary parameters that fluctuate drastically. Capacity can be upscaled and downscaled in two ways:

  • Horizontal scaling: Parallel capacity is being added to accommodate additional load.
  • Vertical scaling: Increase and decrease in CPU and memory, without changing infrastructure count.

Horizontal scaling is easy to achieve by putting load balancers in front of backend infrastructure. This distributes load without impacting existing, ongoing transactions. Vertical scaling requires infrastructure to be scaled up or scaled down. Desired capacity is provisioned in parallel and workload is transferred to new capacity. Workload being transferred to new capacity may cause minimum to no impact on existing transactions if the application system is designed to manage these types of scenarios.

When scaling an application, it's important to consider not just the workload of the application itself, but also the communication mechanism that connects it with other applications. A communication mechanism is an essential component of any distributed application, allowing different parts of the system to exchange information and work together to achieve common goals. As the workload of an application increases, so does the demand on its communication mechanism. If the communication mechanism isn't designed to scale, it can become a bottleneck, limiting the overall performance of the application. This can lead to degraded user experience, increased latency, and even system failures.

Designing an architecture application to scale elastically requires thorough consideration of all the aspects of compute, memory, storage, network, throughput, and throttling limits. The following information discusses common scenarios for designing a high performance, scalable system. Determine your approach based on your organizational needs, business values, and priorities for running and deploying your workload in OCI.

On-Demand Scaling

On-demand scaling is a feature of cloud computing that lets you automatically adjust the resources allocated to your application or service based on current demand. You can increase or decrease the number of servers, storage, and other resources you use in real time, based on the workload. For example, OCI Compute adjusts the number of compute instances in an instance pool based on metrics and schedule to meet desired capacity according to configuration.

Scale-In and Scale-Out

Scale-in refers to reducing the number of resources allocated to an application or service, while scale-out refers to increasing the number of resources. You can use both types of scaling to handle spikes in load. For example, OCI Burstable Instances are designed for scenarios where an instance is typically idle or has low CPU utilization with occasional spikes. After a burst is finished, the instance is scaled-in to baseline capacity.

Service Limits and Quotas

Design your workload so that it's balanced across different modules of an application to scale respectively. Also consider service limits on tenancy and compartment quotas. For example, you can create 50 load balancers and 5000 MBps for an Oracle Universal Credits subscription model (as of today, which may change in the future). Some service limits are dynamic and can be increased over time based on usage. Refer to your OCI pricing model for different limits and quotas.

High Availability

Eliminate single points of failure and design your workload to ensure maximum potential for uptime and accessibility. OCI provides High Availability capabilities as fault domains and availability domains to ensure redundancy in your workload.

For example, distributing your application infrastructure to more than one fault domain provides anti-affinity so that your workload isn't on the same physical hardware within the data center. An availability domain adds an additional layer of reliability since different data centers in the same region are isolated from each other, are fault tolerant, and unlikely to fail simultaneously. Data centers within a region are connected to each other with low latency and high bandwidth which reduces latency issues.

Disaster Recovery

In case of catastrophic failure from natural or human-made disasters, it's essential to have a contingency plan in the workload to replicate into another region. A recovery process is the only option in this type of scenario, and the level of recovery must be designed based on application criticality, associated recovery time objective, (RTO) and recovery point objective (RPO). OCI provides multiple methods for disaster recovery (DR) approach based on data durability, RTO, and RPO.

For example, you can rebuild your system in another region from a database backup and provision infrastructure with infrastructure as a service (IaaS) and DevOps for low cost DR implementation, but higher data loss and unavailability. An active/active system with real-time replication and load sharing has higher cost, but near zero data loss and downtime.

Capacity Planning

Determine the resources needed to support the current and future demands of your workload based on performance requirements, and growth projections to allocate the right amount of resources and autoscaling. You can use historical monitored metrics for projections.

For example, you must provision shape and size for database storage in advance based on linear capacity instead of on demand. Leveraging historic statistical use can help you define capacity to avoid ad-hoc scaling.

Performance Monitoring

Measure, track and analyze performance of cloud resource and application statistics for your application to ensure that they meet performance requirements and deliver optimal performance to users. Monitoring performance is important for the application to be able to identify performance issues and take corrective actions before impacting users. OCI Application Performance Monitoring provides a comprehensive set of features to monitor applications and diagnose performance issues.

For example, Application Performance Monitoring provides deep visibility into performance and provides the ability to diagnose issues quickly, while you focus on delivering core business processes. OCI takes care of monitoring multiple components and application logic spread across clients, third-party services, and back-end computing tiers, on premises or on the cloud.

Performance Principles

Applications must be designed to scale at different modules and components. To achieve this, it's essential that your application is designed for leveraging OCI best practices to scale-up with near zero downtime.

For example, if your application is already broken into thin microservices, then it can be easily deployed and scaled for high performance scenarios on Oracle Kubernetes Engine (OKE) or OCI Container Instances. Alternatively, you must also consider approaches to implementation, such as load balancers, Scatter-Gather (pooling and detaching), result cache, shared space (information enrichment), pipe and filter, MapReduce (batch join for I/O bottleneck), bulk synchronous parallel, Execution Orchestrator, and so on. There are well known design patterns such as Caching, Command Query Responsibility Segregation (CQRS), Anti-Corruption, Circuit Breaker, Event Sourcing, Publisher Subscriber, Sharding, Strangler, Saga, Throttling, and so on, to make your application perform natively.

Cost Versus Capacity

Capacity in cloud can be increased or decreased by adding or removing infrastructure. It's important to use capacity that's already been provisioned efficiently to reduce total running cost. There are some scenarios where business critical applications have exceptions to provision capacity upfront to avoid SLA breaches, which is justified. Make decisions based on business needs and priorities before deciding on capacity since performance needs are proportional to increase in cost.


An aspect of designing cloud applications for scalability is to handle increasing load or traffic in a cost-effective way. Applications must be designed and implemented to maximize efficiency for resource use and cost. You must monitor resource use and automate as much as possible for autoscaling, while considering cost optimization and performance enhancement. OCI provides autoscaling capability in most of our services, with configuration. Some cloud services can be automated with third-party systems.


Trade-offs are an important consideration for cloud applications and scalability. They involve making decisions about which aspects of an application to prioritize to achieve scalability. As applications grow and become more complex, it becomes increasingly difficult to optimize all aspects of the application simultaneously. Therefore, you must make trade-offs to achieve optimal scalability.

Some common trade-offs you must consider are:

  • Performance versus cost
  • Flexibility versus complexity
  • Resilience versus cost
  • Scalability versus security
  • Time to market versus scalability

Use of Serverless and Containers

Use of serverless and containers can significantly improve scalability in OCI. This lets applications be broken down into smaller, more efficient components that can be easily scaled up or down as needed. In OCI, you can choose from OCI Container Engine for Kubernetes (OKE) for container deployment to scale microservices independently. Also consider OCI Functions and OCI Container Instances to leverage serverless platform elastic architecture for highly scalable services to deploy your workload.

Database Consideration

Data is the most critical part of any application. Your entire system relies on integrity and information. It's important to consider the right database type to suit your application and business needs.

There was a time when information was kept in relational database management systems (RDBMS). Now in the cloud, you can choose from various database options and providers. Today, databases cater to unique business needs and range from micro size to incredibly large capacity for handling data.

For example, Oracle Autonomous Database is the best fit for transactional data, whereas MySQL is more suitable for analytics and machine learning. In turn, Oracle NoSQL database is suitable for simple data but provides predictable single digit millisecond latency responses to simple queries. You can consider Exadata infrastructure for more secure, high performance and isolation needs.

Storage Consideration

Data comes in many formats, sizes, and access frequency, but always requires space for storage. There are options from object store, block store, and file store to store data, but each has a specific purpose. OCI Object Storage is built for any type of content to be stored as standard, infrequent, and archive storage. OCI File Storage is built to provide a traditional file system which can be mounted as network storage or NFS. OCI Block Volume is meant for attachment to an instance and can be used as boot volume and block volume.

Performance Baseline

This approach is used to determine baseline level of performance for an application to run in the cloud to sustain against anticipated load. It establishes a reference performance point. This point is used for future reference points in changes required when scaling to meet desired performance level.

Load and Stress Test

Load testing simulates near-real user loads to measure performance of application response to increased traffic. This checks if a system can handle expected user loads without deteriorating performance or crashes. It also validates the scalability of a system against variable load onto the application. This lets response behavior identify bottlenecks and helps optimize system configuration and capacity planning.

Stress testing goes beyond expected user load and validates the ability of an application to handle sudden spikes in traffic or unusual load. It also validates whether a system can handle unusual and unexpected load without breaking down the system or drastically degrading performance. This pushes the system limits to identify weaknesses in system architecture or capacity, which helps optimize the system's scalability and resiliency.

Identify Bottle-Necks

In cloud applications, it's critical for ensuring an application's scalability and performance to identify bottlenecks and handle cascaded performance degradation. Identify points in the system where the flow of data or processing is restricted or slowed down, decreasing performance, and potentially causing the system to fail against desired capacity. Bottlenecks can happen in various areas, including network, storage, processing, and database access.

Data Driven Approach

Use monitoring tools and data analytics to collect and analyze data for system performance, usage patterns and user behavior to implement logical approach, and data insights to optimize system performance, capacity and scalability. This leads to a better user experience and increased business value. Some key steps to implement a data driven approach are:

  • Collect and analyze data.
  • Identify patterns and trends.
  • Optimize system capacity based on insights.
  • Continuously monitor and adjust system performance.

Monitoring Health

Monitoring application health for cloud applications is crucial to ensure desired scalability and performance. A healthy application operates efficiently and meets user demands. An unhealthy application experiences issues, such as slow response time, high error rate, and crashes. Monitoring and acting on the health of applications can result in early detection of issues, optimize performance, improve user experience, and reduce cost. OCI Monitoring actively and passively monitors cloud resources using the metrics and alarm feature to collect and act on configurations.

Troubleshooting Performance Issues

Performance issues require different approaches to troubleshoot in cloud environments. Your access levels in the cloud might be different from your local system or on-premises infrastructure. Steps that can help you troubleshoot performance issues in the cloud are:

  • Define the problem.
  • Collect the data.
  • Analyze data.
  • Identify problem causes.
  • Test and isolate.
  • Address the issue.
  • Monitor and validate.

Consider reviewing common troubleshooting steps provided with each OCI service.

Identify Service Limits

It's important to consider individual cloud service limits to make your system highly scalable and achieve maximum capacity. By knowing the performance limits, you can design your system to operate within the limits and help avoid performance bottlenecks or service interruptions. For example, an OCI compute instance with a bare metal or virtual machine always has known limits and issues to consider. You can use OCI Load Balancer to increase performance by deploying additional capacity behind the load balancer.

Independent Service SLA

Designing a high performance and scalable system requires thorough consideration of each cloud service SLA to reduce downtime, even in the case of disaster. Each service in the cloud has a well defined SLA and Service Level Objectives (SLOs). You might need to consider redundancy in the system to meet or beat availability of individual services or services with the least SLA times. OCI has defined Service Level Objective for PaaS and SaaS Public Cloud Services to consider in your design and architecture.

Release Notes

OCI constantly updates services with new features, capabilities, and patches. Staying aware of the latest changes can help improve your applications and implement new features. The updates also contain bug fixes and security updates, service disruptions and downtime, changes to pricing and service plans, compliance and regulatory changes. OCI Release Notes provide a consolidated view for cloud services to stay up to date about announcements and changes.

Cost Analysis Versus Capacity

For scalability, it's important to consider your size and shape requirements, and add an additional cap. It's also important to consider the cloud service suited to your needs versus simply choosing expensive services without reasons that justify the cost. Avoid using unnecessary services and adding layers to your system to reduce cost and latency. Thorough analysis of cloud services cost, features, and availability is crucial to design a highly cost-effective system.