HADB is a distributed system consisting of pairs of nodes. Nodes are divided into two data redundancy units (DRUs), with a node from each pair in each DRU, as illustrated in Data Redundancy Units.
Each node consists of:
A set of processes for transactional state replication
A dedicated area of shared memory used for communication among the processes.
One or more secondary storage devices (disks).
A set of HADB nodes can host one or more session databases. Each session database is associated with a distinct application server cluster. Deleting a cluster also deletes the associated session database.
For HADB hardware requirements, seeHardware and Software Requirements in Sun GlassFish Enterprise Server 2.1 Release Notes.
There are two types of HADB nodes:
Spare nodes that do not contain any data initially, but perform as active nodes if an active node becomes unavailable. Spare nodes are optional but useful for achieving higher availability.
Each node has a parent process and several child processses. The parent process, called the node supervisor (NSUP), is started by the management agent. It is responsible for creating the child processes and keeping them running.
The child processes are:
Transaction server process (TRANS), that coordinates transactions on distributed nodes, and manages data storage.
Relational algebra server process (RELALG) that coordinates and executes complex relational algebra queries such as sorts and and joins.
SQL shared memory server process (SQLSHM) that maintains the SQL dictionary cache.
SQL server process (SQLC), that receives client queries, compiles them into local HADB instructions, sends the instructions to TRANS, receives the results and conveys them to the client. Each node has one main SQL server and one sub-server for each client connection.
Node manager server process (NOMAN) that management agents use to execute management commands issued by the hadbm management client.
As previously described, an HADB instance contains a pair of DRUs. Each DRU has the same number of active and spare nodes as the other DRU in the pair. Each active node in a DRU has a mirror node in the other DRU. Due to mirroring, each DRU contains a complete copy of the database.
The following figure shows an example HADB architecture with six nodes: four active nodes and two spare nodes. Nodes 0 and 1 are a mirror pair, as are nodes 2 and 3. In this example, each host has one node. In general, a host can have more than one node if it has sufficient system resources (see System Requirements).
You must add machines that host HADB nodes in pairs, with one machine in each DRU.
HADB achieves high availability by replicating data and services. The data replicas on mirror nodes are designated as primary replicas and hot standby replicas. The primary replica performs operations such as inserts, deletes, updates, and reads. The hot standby replica receives log records of the primary replica’s operations and redoes them within the transaction life time. Read operations are performed only by the primary node and thus not logged. Each node contains both primary and hot standby replicas and plays both roles. The database is fragmented and distributed over the active nodes in a DRU. Each node in a mirror pair contains the same set of data fragments. Duplicating data on a mirror node is known as replication. Replication enables HADB to provide high availability: when a node fails, its mirror node takes over almost immediately (within seconds). Replication ensures availability and masks node failures or DRU failures without loss of data or services.
When a mirror node takes over the functions of a failed node, it has to perform double work: its own and that of the failed node. If the mirror node does not have sufficient resources, the overload will reduce its performance and increase its failure probability. When a node fails, HADB attempts to restart it. If the failed node does not restart (for example, due to hardware failure), the system continues to operate but with reduced availability.
HADB tolerates failure of a node, an entire DRU, or multiple nodes, but not a “double failure” when both a node and its mirror fail. For information on how to reduce the likelihood of a double failure, see Mitigating Double Failures
When a node fails, its mirror node takes over for it. If the failed node does not have a spare node, then at this point, the failed node will not have a mirror. A spare node will automatically replace a failed node’s mirror. Having a spare node reduces the time the system functions without a mirror node.
A spare node does not normally contain data, but constantly monitors for failure of active nodes in the DRU. When a node fails and does not recover within a specified timeout period, the spare node copies data from the mirror node and synchronizes with it. The time this takes depends on the amount of data copied and the system and network capacity. After synchronizing, the spare node automatically replaces the mirror node without manual intervention, thus relieveing the mirror node from overload, thus balancing load on the mirrors. This is known as failback or self-healing.
When a failed host is repaired (by shifting the hardware or upgrading the software) and restarted, the node or nodes running on it join the system as a spare nodes, since the original spare nodes are now active.
Spare nodes are not required, but they enable a system to maintain its overall level of service even if a machine fails. Spare nodes also make it easy to perform planned maintenance on machines hosting active nodes. Allocate one machine for each DRU to act as a spare machine, so that if one of the machines fails, the HADB system continues without adversely affecting performance and availability.
As a general rule, have a spare machine with enough Application Server instances and HADB nodes to replace any machine that becomes unavailable.
The following examples illustrate using spare nodes in HADB deployments. There are two possible deployment topologies: co-located, in which HADB and Application Servers reside on the same hosts, and separate tier , in which they reside on separate hosts. For more information on deployment topologies, see Chapter 3, Selecting a Topology
As an example of a spare node configuration, suppose you have a co-located topology with four Sun FireTM V480 servers, where each server has one Application Server instance and two HADB data nodes.
For spare nodes, allocate two more servers (one machine per DRU). Each spare machine runs one application server instance and two spare HADB nodes.
Suppose you have a separate-tier topology where the HADB tier has two Sun FireTM 280R servers, each running two HADB data nodes. To maintain this system at full capacity, even if one machine becomes unavailable, configure one spare machine for the Application Server instances tier and one spare machine for the HADB tier.
The spare machine for the Application Server instances tier must have as many instances as the other machines in the Application Server instances tier. Similarly, the spare machine for the HADB tier must have as many HADB nodes as the other machines in the HADB tier.