Search with OpenSearch Cluster Performance Sizing

Learn how to effectively size your OpenSearch cluster.

OpenSearch is an open-source search and analytics engine that powers everything from log analytics and observability to full-text search capabilities. OpenSearch is a reliable search platform that handles both structured and unstructured data, offering real-time search, analytics, and visualization of large-scale data. Whether you're using managed OpenSearch from a cloud provider or running it on-prem, sizing your OpenSearch cluster appropriately is essential for both performance and cost efficiency.

Sizing an OpenSearch cluster can be challenging as no one-size-fits-all formula exists. Improper sizing can lead to slow query performance, inefficient indexing, or even downtime. This topic covers the key factors that influence cluster sizing and provide actionable guidance for you to get started. The topic also walks you through a capacity planning exercise to put the concepts into practice.

Factors Affecting Performance and Cost

The process of sizing and determining the capacity of an OpenSearch cluster involves balancing cost and performance. Several factors influence both performance and sizing decisions, making it essential to gather data on these aspects to fine-tune your capacity planning process.

You can group these factors into the following categories: use-case specific, data-related, and resource-related considerations.

Use-case specific: These factors are the intended purpose of the OpenSearch deployment. For example, is the cluster being used for log analytics or observability, which typically involves a higher volume of writes compared to reads? Or is it being used for application search, where data is written once and read many times? Alternatively, is the cluster intended to support resource-intensive machine learning workloads? Understanding your use case helps determine the required read and write latency and throughput, query patterns, and the types of queries, such as filters, aggregations, or basic searches, that need to be optimized.
Data-related: This factor include the volume of data to be stored, the data type (structured or unstructured), document size, cardinality, field types, and whether field mappings are defined. Also, the compression codec in use and the size and number of indices, shards, and replicas, all contribute to the overall sizing and performance of the cluster.
Resource-related: These factors involve the configuration and number of nodes in the cluster, including cluster manager nodes, data nodes, and dashboard nodes. These resources directly impact the cluster's ability to handle the defined workload and ensure best performance.

Steps for Sizing the OpenSearch Cluster

Although capacity planning can initially seem overwhelming, it becomes more manageable as you experiment and address concerns incrementally. The key is to start with an initial estimate based on established best practices, continuously assess whether your key performance indicators (KPIs) are being met, and iterate until you achieve the ideal balance between cost and performance. The following steps outline the capacity planning process:

Estimating Storage Requirements

In this section, you can apply the principles discussed to a realistic cluster sizing exercise. Imagine you have a log analytics use case where you ingest data across several indexes in an OpenSearch cluster at a rate of 10 GB per day. Your goal is to retain the data for a maximum of 90 days, after which you can safely discard it.

Data ingestion can be either in batches or continuous, depending on the use case. In many application search and business intelligence scenarios, data is loaded in bulk during batch processes and then queried for various operations such as filtering, aggregation, and search. These types of indexes are considered long-lived, and their storage requirements can typically be estimated by analyzing the source data, which tends to change infrequently.

For other use cases, such as log analytics, time-series analysis, and real user monitoring, data is continuously ingested into the cluster, with indices rolling over after a certain size or time threshold is reached. For these types of indexes, it's essential to calculate the volume of data ingested over a specific time period and then multiply that by the required retention period to estimate storage requirements.

In addition to the total volume of data being ingested, several other factors influence the amount of storage required:

Retention period (r): The duration for which data is retained in the cluster decides the maximum amount of data that is stored at any particular time. This figure helps estimate the cluster's capacity to handle the worst-case scenario, where the largest amount of data is present, ensuring ideal performance even under peak storage conditions.
Replica Count (rc): Each replica is an exact copy of its primary shard, and it serves both reliability and performance purposes. To prevent data loss, we recommend you have at least one replica, which is the default setting for every index. If your use case involves high read volumes, it might be beneficial to configure more than one replica. This allows several replica shards to handle concurrent search queries, thereby distributing the load and improving query performance.
Compression ratio (cr): The data stored on disk isn't identical to the raw data volume being ingested. Instead, it's typically smaller because of the compression applied during the indexing process. OpenSearch supports various compression codecs, each offering a different trade-off between compression ratio and performance. You can select the most appropriate codec for your use case during index creation.
The compression ratio achieved depends on several factors, including the type of data (structured vs. unstructured), the number of repeated or empty field values, data cardinality, and the overall size of the data. A reasonable starting assumption for compression is typically between 25% and 40%. For example, 10 GB of raw data would occupy between 6 GB (cr=1.67) and 7.5 GB (cr=1.33) on disk. However, it's important to validate this assumption with actual testing for your specific use case.
OpenSearch Indexing Overhead (o1): OpenSearch incurs up to 10% storage overhead to index raw data. After indexing, you can use the /_cat/indices?v API along with the pri.store.size value to calculate the exact value in actual.
OpenSearch Service Overhead (o2): OpenSearch reserves up to 20% of the storage space on each instance, with a maximum of 20 GB, for segment merges, logs, and other internal operations.
Operating System Overhead (o3): By default, Linux reserves 5% of the file system for the root user to ensure critical system processes, such as logging and temporary file creation, continue to function when the disk is almost full. This reserved space also helps with system recovery and prevents disk fragmentation thus avoiding operational disruptions.
Buffer to accommodate Error Margin (b): We recommend you add a 5–10% buffer to account for human error and overly optimistic assumptions, as the shard count can't be changed after an index is defined. You can adjust or remove this buffer later as you gather actual usage data.
Formula:
```
Required Storage = Source Data * r * (1 + rc) * (1/cr) * (1 + o1) * (1/1-o2) * (1/1-o3) * (1+b)
```
Applying this formula to the example exercise results in the following:
- Source Data = 10 GB per day
- r = 90, considering retention period of 90 days
- rc = 1, considering we want 1 replica
- cr = 1.33, considering we achieve a compression ratio of 25%
- o1 = 0.1 (10% indexing overhead)
- o2 = 0.2 (20% service overhead)
- o3 = 0.05 (5% operating system overhead)
- b = 0.1 (10% error margin buffer)
- Required Storage = 10 * 90 * 2 * 0.75 * 1.1 * 1/0.8 * 1/0.95 * 1.1 = 2150 GB = 2.2 TB (rounded)

Defining the Sharding Strategy

After you've established your storage requirements, the next step is to define your sharding strategy. This includes determining the size, number, and distribution of your shards. OpenSearch shards are independent partitions of your index data, designed to distribute data across cluster nodes to ensure high performance and fault tolerance.

There are two types of shards: primary and replica. A primary shard is the authoritative copy of a data partition and handles both read and write operations. In contrast, a replica shard is an exact copy of the primary shard and is used solely for read operations. All write requests are first directed to the primary shard, after which the data is replicated to the replica shards.

OpenSearch automatically allocates both primary and replica shards across nodes to ensure that primary shards are distributed as evenly as possible and that primary and replica shards aren't placed on the same node. For example, an index with three primary and one replica shard each results in six total shards—three primary and three replica shards. In this configuration, each write request is handled by two shards (one primary and one replica), while read requests can be served by any combination of primary and replica shards, providing redundancy and load balancing.

Shard Size

Each shard in OpenSearch is an independent Lucene index, meaning its size is constrained by the hardware resources available to maintain reasonable performance. However, it's crucial to avoid both too few and too many shards. Having too few large shards can lead to slower search performance and complicate fault tolerance, as moving large shards between nodes becomes time-consuming in the event of a failure. Conversely, too many shards can lead to inefficient resource usage and can lead to overly distributed queries, resulting in time-consuming aggregations and slower query performance.

For read-heavy workloads, which require more frequent lookups across the Lucene index, we recommend you keep shard sizes between 10–30 GB to ensure low latency. For write-heavy workloads, shard sizes can be larger, typically between 30–50 GB to accommodate higher data ingestion rates without compromising performance. Because this example use case is log analytics, which is write-heavy, you can let your shards be 40 GB in size.

Shard Count

The primary goal when deciding the number of shards is to distribute the index data evenly across all data nodes in the cluster. You can calculate the shard count by dividing the total data volume by the required shard size. Ensure that the shard count is an even multiple of the number of data nodes in the cluster so that the distribution of shards remains balanced. For example, if you have 8 primary shards, you can opt for 2, 4, or 8 data nodes. Choosing 3 data nodes in this case would result in an uneven distribution with some nodes holding more shards than others, leading to uneven load across the cluster.

The following equation provides the shard count based on the total data you plan to store for the index and your preferred shard size

Shard Count = Total Index Size / Desired Shard Size

This data is ingested gradually over time rather than all at once. As a result, the shard count calculated using this formula might lead to overly small shards initially, but over time they will likely reach the preferable size. To account for this discrepancy during the early stages, before the data grows to its estimated volume, we recommend you adopt a middle ground by adjusting the shard count to a more appropriate value based on current needs.

For the example exercise, the total shard count would be approximately 50, calculated as 2150 GB / 40 GB shard size, including replicas. The primary shard count should be around 25. We recommend you try for a primary shard count that's near about this number.

Configuring the Data Node

After you've determined your storage requirements and decided the size and number of shards, you can begin making hardware decisions. While hardware requirements can vary depending on the workload, here are some general recommendations.

Heap Memory

OpenSearch typically uses 50% of the available memory on a data node, with the remaining part allocated to the operating system. For example, if a data node has 32 GB of memory, OpenSearch uses 16 GB for its heap. The recommended maximum heap size for most cases is 32 GB. Exceeding this limit can lead to increased unneeded overhead, which can negate potential performance benefits. The recommended maximum memory for a data node is 64 GB, with 32 GB allocated to OpenSearch heap. You can evaluate memory greater than 64 GB memory for special cases after careful cost-benefit analysis.

For this exercise, each of our data node is an 8 vCPU/32 GB machine, which leaves 16 GB of heap for use by OpenSearch.

Storage

Typically, the memory-to-storage ratio on data nodes ranges from 1:16 (for workloads with high search activity or numerous active shards) to 1:64 (for write-heavy workloads or nodes hosting less frequently accessed shards). For example, a node with 64 GB of RAM is typically recommended to have between 1 and 4 TB of storage. However, the ideal storage size per node can vary and should be determined through experimentation. Sometimes, a different ratio might be more appropriate.

For this exercise data node configuration of 8 vCPU, 32 GB memory a good storage size to attach considering a memory-to-storage ratio of 1:16 would be around 32 * 16 = 512 GB. With 2150 GB of data to store, we need at least 5 nodes each with 512 GB storage.

Shards Per Node

OpenSearch uses heap memory primarily to store segment metadata for each shard, which enables quick retrieval of data locations during search operations. This metadata includes information about where data resides on disk within each segment, allowing OpenSearch to avoid scanning the entire shard during searches, thus improving performance.

However, a limit exists as to how many shards a given amount of heap memory can efficiently support. As a final check, after considering shard size, heap memory, and storage guidelines, ensure that the number of shards for each GB of heap memory doesn't exceed 25. For example, a data node with 32 GB of heap memory should host no more than 800 shards at any time.

Verify that you under the "shards per node" limit. For our example, if you have five data nodes, and 25 primary shards with one replica each, that equals a total of 10 shards per node. With each node having 16 GB of heap, you are under 25 shards per GB.

CPU Cores

OpenSearch is a CPU-intensive application, as it handles tasks like indexing, searching, and aggregating data. As a result, having enough CPU cores is crucial to ensure these operations are performed efficiently. The exact number of CPU cores required varies depending on the workload and specific use case.

Each active shard (a shard that's being read from or written to) requires one vCPU to handle its operations. As a best practice, we recommend you start with at least one vCPU per active shard. For example, if each of your data nodes has 8 vCPUs, try to configure your cluster so that no data node hosts more than eight active shards.

However, if your cluster has many shards, performs heavy aggregations, updates documents often, or processes many complex queries, these resources might not be enough. A good starting point is to allocate 2 vCPUs and 8 GB of memory for every 100 GB of storage requirement. This amount is an approximation and you should be fine-tuned based on your specific KPIs.

For our example exercise, considering that 4 of the 10 shards is actively in use either for indexing or searching operations, best practice is to use a node with between 6–8 vCPU cores.

Data Node Count

Calculate the number of data nodes needed to host all the shards based on the available resources (cores and heap memory) for each data node and the total number of shards. We recommend you have at least two data nodes in a cluster to enable replication and ensure fault tolerance and reliability. For larger clusters, you should plan for scalability. The maximum number of data nodes that a cluster can stretch to is typically 180–200. For larger amounts, consider breaking the cluster into smaller, more manageable units. Smaller clusters of around 40–50 nodes are preferable as they offer better operational manageability.

A good starting point for our example cluster is to have 5 data nodes, each having 8 vCPUs, 32 GB memory and 512 GB of storage. Each data node hosts a total of about 10 shards of 40 GB each.

Configuring the Cluster Manager Node

To enhance cluster stability and performance, we recommend you use dedicated cluster manager nodes. These nodes offload critical management tasks from the data nodes, allowing them to focus solely on data storage and query handling. Cluster manager nodes don't store data or handle data upload requests. Instead, they manage the overall health and state of the cluster. They track all nodes in the cluster, monitor the number of indices, and oversee shard distribution for each index.

Manager nodes maintain routing information, update the cluster state during changes (such as creating indices or adding/removing nodes), and replicate state updates across all nodes. They also send heartbeat signals to monitor the health and availability of data nodes, ensuring the cluster remains operational.

We recommend you have at least three cluster manager nodes for best reliability. Having only one manager node is risky as the cluster could become unavailable if that node fails. Also, using an even number of manager nodes isn't ideal as it prevents the cluster from forming the necessary quorum to elect a new manager in case of failure. Three manager nodes provide two backup nodes in the event of a failure and ensure a quorum for electing a new manager.

Increasing the number of manager nodes in odd increments (5, 7, etc.) can enhance fault tolerance, allowing the cluster to withstand more manager node failures while remaining operational. However, adding more than three manager nodes is excessive and should only be considered for exceptionally large clusters or special use cases after thorough analysis.

Although dedicated cluster manager nodes don't handle search and query requests, their resource requirements are closely tied to the number of data nodes, indices, and shards they need to manage. The following guideline can serve as a starting point:

4 vCPU and 8 GB memory: Up to 16
8 vCPU and 16 GB memory: From 17–32
8 vCPU and 32 GB memory: From 33–64
16 vCPU and 64 GB memory: From 65–128

For our exercise, because we have five data nodes, we can start with having three cluster manager nodes each with 4 vCPU and 8 GB memory.

Configuring the Dashboard Node

Dashboard nodes are the easiest node type to size, as they don't directly index or search data locally and have minimal storage requirements. These nodes primarily function as API servers, issuing requests to the cluster and handling data I/O. The key resources required for dashboard nodes are CPU and memory, which depend on the number of simultaneous dashboard requests.

While it's common to use a single dashboard node in smaller or even production clusters, for high availability (HA) OpenSearch dashboard application, we recommend you use at least two nodes. A good starting point is to deploy one or two dashboard nodes, depending on your HA needs. For low-throughput scenarios, allocate 4 vCPUs and 8 GB of memory per node, and for high-throughput scenarios, allocate 8 vCPUs and 16 GB of memory per node. After deployment, monitor CPU utilization and JVM pressure to fine-tune resources based on actual performance.

Given that your log analytics application are primarily used by a limited number of professionals for root cause analysis, debugging, and occasional review and reporting, you can begin with a single dashboard node featuring 8 vCPUs and 16 GB of memory. You can then adjust the resources—either scaling up or down—based on usage patterns and the performance of this initial configuration.

Testing and Iterating

Perform the following tests and adjust your configuration as needed:

Use Production-Like Test Data: Ensure your test data closely mirrors the production data that is ingested into the cluster. This includes matching the structure of documents (fields, data types, etc.) and their size to what is used in production.
Simulate Real-World Load: Conduct performance tests under real-world conditions. Start by creating a small cluster that simulates the expected load, then extrapolate results to estimate the performance of larger clusters. Your test data should be of significant volume (in GBs) to accurately reflect the load.
Monitor and Analyze Key Metrics: Continuously monitor critical metrics such as CPU utilization, JVM pressure, disk usage, search and indexing latency, search and indexing rates, and overall availability. Adjust the cluster configuration as needed to meet the desired KPIs.
Continuous Monitoring in Production: Ongoing monitoring of your production cluster is essential. Regularly check these KPIs, and take corrective action if performance deviates from expected thresholds. Relying solely on one-time tests before going live isn't enough. Continuous validation is critical to maintaining the best performance.

Note

The cluster size you have decided for this example use case represents an initial starting point. The final configuration suitable for your specific use case can vary based on the results of your performance tests.

Conclusion

Sizing an OpenSearch cluster requires a thoughtful approach that balances performance and cost, consider factors such as use case, data volume, and resource allocation. This topic has provided key guidance, best practices, and thumb rules to help you get started with capacity planning. By following a structured process and iterating based on performance tests, you can ensure that your cluster is optimized for both efficiency and scalability. Regular monitoring and changes in response to real-world load are essential to maintaining peak performance. The optimal cluster configuration evolves as you gather insights from your specific workload and usage patterns.

Oracle Cloud Infrastructure Documentation