Deploy a High-Performance Storage Cluster Using IBM Spectrum Scale

IBM Spectrum Scale is a cluster file system that provides concurrent access to one or more file systems from multiple nodes. The nodes can be SAN-attached, network-attached, a mixture of SAN-attached and network-attached, or in a shared-nothing cluster configuration. Spectrum Scale enables high-performance access to a common set of data to support a scale-out solution or to provide a high-availability platform.

Architecture

One use case for Spectrum Scale is deploying SAS Grid applications that need a robust I/O subsystem. This reference architecture discusses deploying a high I/O throughput solution by using an IBM Spectrum file system on Oracle Cloud Infrastructure.

This reference architecture uses a region with one availability domain and regional subnets. You can use the same reference architecture in a region with multiple availability domains. We recommend that you use regional subnets for your deployment, regardless of the number of availability domains.

The following diagram illustrates this reference architecture.

Description of the illustration specter-oci.png

The Spectrum Scale file system architecture has the following components:

CES node
Cluster Export Services (CES) nodes can serve integrated protocol functions. These nodes provide SMB, NFS, or Object access to data in the IBM Spectrum Scale file system. This node is optional. We recommend using a VM.Standard2.8 or higher shape (at least two VNICs) for higher throughput.
MGMT GUI node
This node provides a GUI interface for users to monitor their Spectrum Scale file system. This node is optional. We recommend using a VM.Standard2.16 or higher shape to provide sufficient OCPU and memory.
Client node
These nodes use the Spectrum Scale file system. They are served disk data by the Network Shared Disk (NSD) servers.
NSD server
These servers use the NSD protocol to serve data to client nodes in a client-server protocol model. NSD servers provide access to storage that is visible on servers as local block devices.
Object Storage
Oracle Cloud Infrastructure Object Storage is a durable and scalable internet-scale storage service.
Virtual cloud network (VCN) and subnets
A VCN is a software-defined network that you set up in an Oracle Cloud Infrastructure region. VCNs can be segmented into subnets, which can be specific to a region or to an availability domain. Both region-specific and availability domain-specific subnets can coexist in the same VCN. A subnet can be public or private.
Security lists
For each subnet, you can create security rules that specify the source, destination, and type of traffic that must be allowed in and out of the subnet.
Availability domains
Availability domains are standalone, independent data centers within a region. The physical resources in each availability domain are isolated from the resources in the other availability domains, which provides fault tolerance. Availability domains don’t share infrastructure such as power or cooling, or the internal availability domain network. So, a failure at one availability domain is unlikely to affect the other availability domains in the region.

Recommendations

Your requirements might differ from the architecture described here. Use the following recommendations as a starting point.

Compute shape, bastion host
A bastion host is used to access any nodes in the private subnet. Use the VM.Standard.E2.1 or the VM.Standard.E2.2 shape.
Compute shape, CES node
Use a VM.Standard2.8 or higher shape (at least two VNICs) for higher throughput.
Compute shape, MGMT GUI node
Use a VM.Standard2.16 or higher shape to provide sufficient OCPU and memory.
Compute shape, client node
The user can have multiple client nodes. Start with a VM.Standard2.24 shape and scale up or down as needed.
Compute shape, NSD server
NSD servers require high throughput and processing power. Use a BM.Standard2.52 or BM.Standard.E2.64 shape. Also, use at least two NSD server nodes.
VCN
When you create the VCN, determine how many IP addresses your cloud resources in each subnet require. Using the Classless Inter-Domain Routing (CIDR) notation, specify a subnet mask and a network address range that's large enough for the required IP addresses. Use an address range that's within the standard private IP address space.

Select an address range that doesn’t overlap with your on-premises network, so that you can set up a connection between the VCN and your on-premises network, if necessary.

After you create a VCN, you can't change its address range.

When you design the subnets, consider your traffic flow and security requirements. Attach all the resources within a specific tier or role to the same subnet, which can serve as a security boundary.

Use regional subnets.
Security lists
Use security lists to define ingress and egress rules that apply to the entire subnet. For example, this architecture allows ICMP internally for the entire private subnet.

Considerations

Performance
To get the best performance, choose the correct Compute shape with the appropriate bandwidth.
Availability
Consider using a high-availability option based on your deployment requirement.
Cost
Bare metal instances provide higher performance on I/O operations for a higher cost. Evaluate your requirements to choose the appropriate Compute shape.
Monitoring and alerts
Set up monitoring and alerts on CPU and memory usage for your nodes to scale the shape up or down as needed.

Deploy

The Terraform code to deploy this reference architecture is available on GitHub.

Go to GitHub.
Clone or download the repository to your local computer.
Follow the instructions in the README document.