2
Parallel Hardware Architecture

This chapter describes the hardware components and various high-level architectural models that typify cluster environments. The model you select to deploy your Oracle Parallel Server application depends on your processing goals.

Oracle Parallel Server environments are typically deployed with several nodes interconnected to form a cluster. This chapter explains the basic hardware for nodes as well as the hardware that is used to make the nodes into a cluster.

Topics in this chapter include:

Overview of Cluster Hardware Components

A cluster comprises two or more nodes that are linked by an interconnect. The interconnect serves as the communication path between the nodes in the cluster. The nodes use the interconnect for communication required to synchronize each instance's manipulation of the shared data. The shared data that the nodes access resides in storage devices. A cluster is also known as a "loosely coupled computer system".

The following sections describe these components in more detail.

What is a Node?

A node has four main components:

CPU - The main processing component of a computer that reads from and writes to the computer's main memory.
Memory - The component used for programmatic execution and the buffering of data.
Storage - A device that stores data. Usually a persistent storage that must be accessed by read/write transactions to alter its contents.
Interconnect - This is the communication link between the nodes.

You can purchase these components in a number of different configurations. Their arrangement determines how each node in a cluster accesses memory and storage.

All clusters use CPUs in more or less the same manner. However, the remaining components, memory, storage, and the interconnect, can be configured in different ways for different purposes. The remaining sections of this chapter explain how clusters use these components by describing:

Memory Access

Multiple CPUs are typically configured to share main memory. This allows you to create a single computer system that delivers scalable performance. This type of system is also less expensive to build than a single CPU with equivalent processing power. A computer with a single CPU is known as a "uniprocessor".

There are two configurations of shared memory systems:

Shared memory systems are also known as "tightly coupled computer systems".

Uniform Memory Access

In uniform memory access configurations, or UMA, all processors can access main memory at the same speed. In this configuration, memory access is uniform. This configuration is also known as a Symmetric Multi-Processing system or "SMP".

Non-Uniform Memory Access

Non-uniform memory access, or NUMA, means that all processors have access to all memory structures. However, the memory accesses are not equal. In other words, the access cost varies depending on what parts of memory each processor accesses. In NUMA configurations, the cost of accessing a specific location in main memory is different for some of the CPUs relative to others.

Performance in both UMA/SMP and NUMA systems is limited by memory bus bandwidth. This means that as you add CPUs to the system beyond a certain point, performance will not increase linearly. The point at which adding CPUs results in minimal performance improvement varies by application type and by system architecture. Typically SMP configurations do not scale well beyond 24 to 64 processors.

Figure 2-1 Tightly Coupled Shared Memory System or SMP/UMA

Advantages of Shared Memory

The parallel processing advantages of shared memory systems are:

Memory access is less expensive than access in a loosely coupled system
Shared memory systems are easier to administer than a cluster

A disadvantage of shared memory systems for parallel processing is that scalability is limited by the bandwidth and latency of the bus and by available memory.

The High Speed Interconnect

This is a high bandwidth, low latency communication facility that connects each node to the other nodes in the cluster. The high speed interconnect routes messages and other parallel processing-specific traffic among the nodes to coordinate each node's access to the data and to the data-dependent resources.

Oracle Parallel Server also makes use of user-mode interprocess communication (IPC) and "memory-mapped IPC". These substantially reduce CPU consumption and reduce IPC latency.

You can use Ethernet, FDDI (Fiber Distributed Data Interface), or some other proprietary hardware for your interconnect. You should also have a backup interconnect available in case your primary interconnect fails. The back-up interconnect enhances high availability and reduces the likelihood of the interconnect becoming a single point-of-failure.

Clusters - Nodes and the Interconnect

As described previously, you must use either a uniprocessor, SMP, or NUMA memory configuration. When configured with an interconnect, two or more of these types of processors make up a cluster. The performance of a clustered system can be limited by a number of factors. These include various system components such as the memory bandwidth, CPU-to-CPU communication bandwidth, the memory available on the system, the I/O bandwidth, and the interconnect bandwidth.

Storage Access in Clustered Systems

Clustered systems use several architectural models. Each architecture uses a particular resource sharing scheme that is best used for a particular purpose.

This section describes the following architectures:

This type of storage access is independent of the type of memory access. For example, a cluster of SMP nodes may be configured with either uniform or non-uniform disk subsystems.

Uniform Disk Access

In uniform disk access systems, or shared disk systems, as shown in Figure 2-2, the cost of disk access is the same for all nodes.

Figure 2-2 Uniform Access Shared Disk System

The cluster in Figure 2-2 is composed of multiple SMP nodes. Shared disk subsystems like this are most often implemented using shared SCSI or Fibre Channel connections to a disk farm.

The advantages of using parallel processing on shared disk systems are:

Shared disk systems permit high availability; all data is accessible even if one node fails
Shared disk systems provide incremental growth

Non-Uniform Disk Access

In some systems, the disk storage is attached to only one node. For that node, the access is local. For all other nodes, a request for disk access as well as the data must be forward by a software virtual disk layer over the interconnect to the node where the disk is locally attached. This means that the cost of a disk read or write varies significantly depending on whether the access is local or remote. The costs associated with reading or writing the blocks from the remote disks, including the interconnect latency and the IPC overhead, all contribute to the increased cost of this type of operation versus the cost of the same type of operation using a uniform disk access configurations.

Non-uniform disk access configurations are commonly used on systems known as "shared nothing systems" or "Massively Parallel Processing (MPP) systems". For high availability, if a node fails, its local disks can usually be reconfigured to be local to another node. For these non-uniform disk access systems, Oracle Parallel Server requires that the virtual disk layer be provided at the system level. In some cases it is much more efficient to move work to the node where the disk or other I/O device is locally attached rather than using remote requests. This ability to collocate processing with storage is known as "disk affinity" and is used by Oracle in a variety of areas including parallel execution and backup.

The advantages of using parallel processing on MPP or non-uniform disk access systems are:

The number of nodes is not limited by the physical disk connection hardware
The total disk storage can be quite large due to the ability to add nodes

Figure 2-3 illustrates a shared nothing system:

Figure 2-3 Non-uniform Disk Access

Oracle Parallel Server Runs on A Wide Variety of Clusters

Oracle Parallel Server is supported on a wide range of clustered systems from a number of different vendors. Architecturally, the number of nodes in a cluster that Oracle Parallel Server can support is significantly greater than any known implementation. For a small system configured primarily for high availability, there may only be two nodes in the cluster. A large configuration, however, may have 40 to 50 nodes in the cluster. In general, the cost of managing a cluster is related to the number of nodes in the system. The trend has been toward using a smaller number of nodes with each node configured with a large SMP system using shared disks.

2Parallel Hardware Architecture