Sun Cluster 3.0 12/01 Concepts

Glossary

This glossary of terms is used in the SunPlex 3.0 documentation.

A

administrative console

A workstation that is used to run cluster administrative software.

amnesia

A condition in which a cluster restarts after a shutdown with stale cluster configuration data (CCR). For example, on a two-node cluster with only node 1 operational, if a cluster configuration change occurs on node 1, node 2's CCR becomes stale. If the cluster is shut down then restarted on node 2, an amnesia condition results because of node 2's stale CCR.

automatic failback

A process of returning a resource group or device group to its primary node after the primary node has failed and later is restarted as a cluster member.

B

backup group

See "Network Adapter Failover group."

C

checkpoint

The notification sent by a primary node to a secondary node to keep the software state synchronized between them. See also "primary" and "secondary."

cluster

Two or more interconnected nodes or domains that share a cluster file system and are configured together to run failover, parallel, or scalable resources.

Cluster Configuration Repository (CCR)

A highly available, replicated data store that is used by Sun Cluster software to persistently store cluster configuration information.

cluster file system

A cluster service that provides cluster-wide, highly available access to existing local file systems.

cluster interconnect

The hardware networking infrastructure that includes cables, cluster transport junctions, and cluster transport adapters. The Sun Cluster and data service software use this infrastructure for intra-cluster communication.

cluster member

An active member of the current cluster incarnation. This member is capable of sharing resources with other cluster members and providing services both to other cluster members and to clients of the cluster. See also "cluster node."

Cluster Membership Monitor (CMM)

The software that maintains a consistent cluster membership roster. This membership information is used by the rest of the clustering software to decide where to locate highly available services. The CCM ensures that non-cluster members cannot corrupt data and transmit corrupt or inconsistent data to clients.

cluster node

A node that is configured to be a cluster member. A cluster node might or might not be a current member. See also "cluster member."

cluster transport adapter

The network adapter that resides on a node and connects the node to the cluster interconnect. See also "cluster interconnect."

cluster transport cables

The network connection that connects to the endpoints. A connection between cluster transport adapters and cluster transport junctions or between two cluster transport adapters. See also "cluster interconnect."

cluster transport junction

A hardware switch that is used as part of the cluster interconnect. See also "cluster interconnect."

collocation

The property of being on the same node. This concept is used during cluster configuration to improve performance.

D

data service

An application that has been instrumented to run as a highly available resource under control of the Resource Group Manager (RGM).

default master

The default cluster member on which a failover resource type is brought online.

device group

A user-defined group of device resources, such as disks, that can be mastered from different nodes in a cluster HA configuration. This group can include device resources of disks, Solstice DiskSuite disksets, and VERITAS Volume Manager disk groups.

device id

A mechanism of identifying devices that are made available via Solaris. Device ids are described in the devid_get(3DEVID) man page.

The Sun Cluster DID driver uses device ids to determine correlation between the Solaris logical names on different cluster nodes. The DID driver probes each device for its device id. If that device id matches another device somewhere else in the cluster, both devices are given the same DID name. If the device id hasn't been seen in the cluster before, a new DID name is assigned. See also "Solaris logical name" and "DID driver."

DID driver

A driver implemented by Sun Cluster software used to provide a consistent device namespace across the cluster. See also "DID name."

DID name

Used to identify global devices in a SunPlex system. It is a clustering identifier with a one-to-one or a one-to-many relationship with Solaris logical names. It takes the form dXsY, where X is an integer and Y is t he slice name. See also "Solaris logical name."

disk device group

See "device group."

Distributed Lock Manager (DLM)

The locking software used in a shared disk Oracle Parallel Server (OPS) environment. The DLM enables Oracle processes running on different nodes to synchronize database access. The DLM is designed for high availability. If a process or node crashes, the remaining nodes do not have to be shut down and restarted. A quick reconfiguration of the DLM is performed to recover from such a failure.

diskset

See "device group."

disk group

See "device group."

E

endpoint

A physical port on a cluster transport adapter or cluster transport junction.

event

A change in the state, mastery, severity, or description of a managed object.

F

failback

See "automatic failback."

failfast

The orderly shutdown and removal from the cluster of a faulty node before its potentially incorrect operation can prove damaging.

failover

The automatic relocation of a resource group or a device group from a current primary node to a new primary node after a failure has occurred.

failover resource

A resource, each of whose resources can correctly be mastered by only one node at a time. See also "single instance resource" and "scalable resource."

fault monitor

A fault daemon and the programs used to probe various parts of data services and take action. See also "resource monitor."

G

generic resource type

A template for a data service. A generic resource type can be used to make a simple application into a failover data service (stop on one node, start on another). This type does not require programming by the SunPlex API.

generic resource

An application daemon and its child processes put under control of the Resource Group Manager as part of a generic resource type.

global device

A device that is accessible from all cluster members, such as disk, CD-ROM, and tape.

global device namespace

A namespace that contains the logical, cluster-wide names for global devices. Local devices in the Solaris environment are defined in the /dev/dsk, /dev/rdsk, and /dev/rmt directories. The global device namespace defines global devices in the /dev/global/dsk, /dev/global/rdsk, and /dev/global/rmt directories.

global interface

A global network interface that physically hosts shared addresses. See also "shared address."

global interface node

A node hosting a global interface.

global resource

A highly available resource provided at the kernel level of the Sun Cluster software. Global resources can include disks (HA device groups), the cluster file system, and global networking.

H

HA data service

See "data service."

heartbeat

A periodic message sent across all available cluster interconnnect transport paths. Lack of a heartbeat after a specified interval and number of retries might trigger an internal failover of transport communication to another path. Failure of all paths to a cluster member results in the CMM reevaluating the cluster quorum.

I

instance

See "resource invocation."

L

load balancing

Applies only to scalable services. The process of distributing the application load across nodes in the cluster so that the client requests are serviced in a timely manner. Refer to "Scalable Data Services" for more details.

load-balancing policy

Applies only to scalable services. The preferred way in which application request load is distributed across nodes. Refer to "Scalable Data Services" for more details.

local disk

A disk that is physically private to a given cluster node.

logical host

A Sun Cluster 2.0 (minimum) concept that includes an application, the disksets, or disk groups on which the application data resides, and the network addresses used to access the cluster. This concept no longer exists in the SunPlex system. Refer to "Disk Device Groups" and "Resources, Resource Groups, and Resource Types" for a description of how this concept is now implemented in the SunPlex system.

logical hostname resource

A resource that contains a collection of logical hostnames representing network addresses. Logical hostname resources can only be mastered by one node at a time. See also "logical host."

logical network interface

In the Internet architecture, a host can have one or more IP addresses. Sun Cluster software configures additional logical network interfaces to establish a mapping between several logical network interfaces and a single physical network interface. Each logical network interface has a single IP address. This mapping enables a single physical network interface to respond to multiple IP addresses. This mapping also enables the IP address to move from one cluster member to the other in the event of a takeover or switchover without requiring additional hardware interfaces.

M

master

See "primary."

metadevice state database replica (replica)

A database, stored on disk, that records configuration and state of all metadevices and error conditions. This information is important to the correct operation of Solstice DiskSuite disksets and it is replicated.

multihomed host

A host that is on more than one public network.

multihost disk

A disk that is physically connected to multiple nodes.

N

Network Adapter Failover (NAFO) group

A set of one or more network adapters on the same node and on the same subnet configured to back up each other in the event of an adapter failure.

network address resource

See "network resource."

network resource

A resource that contains one or more logical hostnames or shared addresses. See also "logical hostname resource" and "shared address resource."

node

A physical machine or domain (in the Sun Enterprise E10000 server) that can be part of a SunPlex system. Also called "host."

non-cluster mode

The resulting state achieved by booting a cluster member with the -x boot option. In this state the node is no longer a cluster member, but is still a cluster node. See also "cluster member" and "cluster node."

P

parallel resource type

A resource type, such as a parallel database, that has been instrumented to run in a cluster environment so that it can be mastered by multiple (two or more) nodes simultaneously.

parallel service instance

An instance of a parallel resource type running on an individual node.

potential master

See "potential primary."

potential primary

A cluster member that is able to master a failover resource type if the primary node fails. See also "default master."

primary

A node on which a resource group or device group is currently online. That is, a primary is a node that is currently hosting or implementing the service associated with the resource. See also "secondary."

primary host name

The name of a node on the primary public network. This is always the node name specified in /etc/nodename. See also, "secondary host name."

private hostname

The hostname alias used to communicate with a node over the cluster interconnect.

Public Network Management (PNM)

Software that uses fault monitoring and failover to prevent loss of node availability because of single network adapter or cable failure. PNM failover uses sets of network adapters called Network Adapter Failover groups to provide redundant connections between a cluster node and the public network. The fault monitoring and failover capabilities work together to ensure availability of resources. See also "Network Adapter Failover group."

Q

quorum device

A disk shared by two or more nodes that contributes votes used to establish a quorum for the cluster to run. The cluster can operate only when a quorum of votes is available. The quorum device is used when a cluster becomes partitioned into separate sets of nodes to establish which set of nodes constitutes the new cluster.

R

resource

An instance of a resource type. Many resources of the same type might exist, each resource having its own name and set of property values, so that many instances of the underlying application might run on the cluster.

resource group

A collection of resources that are managed by the RGM as a unit. Each resource that is to be managed by the RGM must be configured in a resource group. Typically, related and interdependent resources are grouped.

Resource Group Manager (RGM)

A software facility used to make cluster resources highly available and scalable by automatically starting and stopping these resources on selected cluster nodes. The RGM acts according to pre-configured policies, in the event of hardware or software failures or reboots.

resource group state

The state of the resource group on any given node.

resource invocation

An instance of a resource type running on a node. An abstract concept representing a resource that was started on the node.

Resource Management API (RMAPI)

The application programming interface within a SunPlex system that makes an application highly available in a cluster environment.

resource monitor

An optional part of a resource type implementation that runs periodic fault probes on resources to determine if they are running correctly and how they are performing.

resource state

The state of a Resource Group Manager resource on a given node.

resource status

The condition of the resources as reported by the fault monitor.

resource type

The unique name given to a data service, LogicalHostname, or SharedAddress cluster object. Data service resource types can either be failover types or scalable types. See also "data service," "failover resource," and "scalable resource."

resource type property

A key-value pair, stored by the RGM as part of the resource type, that is used to describe and manage resources of the given type.

S

Scalable Coherent Interface (SCI)

A high-speed interconnect hardware used as the cluster interconnect.

scalable resource

A resource that runs on multiple nodes (an instance on each node) that uses the cluster interconnect to give the appearance of a single service to remote clients of the service.

scalable service

A data service implemented that runs on multiple nodes simultaneously.

secondary

A cluster member that is available to master disk device groups and resource groups in the event that the primary fails. See also "primary."

secondary host name

The name used to access a node on a secondary public network. See also "primary host name."

shared address resource

A network address that can be bound by all scalable services running on nodes within the cluster to make them scale on those nodes. A cluster can have multiple shared addresses, and a service can be bound to multiple shared addresses.

single instance resource

A resource for which at most one resource may be active across the cluster.

Solaris logical name

The names typically used to manage Solaris devices. For disks, these usually look something like /dev/rdsk/c0t2d0s2. For each one of these Solaris logical device names, there is an underlying Solaris physical device name. See also "DID name" and "Solaris physical name."

Solaris physical name

The names that is given to a device by its device driver in Solaris. This shows up on a Solaris machine as a path under the /devices tree. For example, a typical SCSI disk has a Solaris physical name of something like: /devices/sbus@1f,0/SUNW,fas@e,8800000/sd@6,0:c,raw

See also "Solaris logical name."

Solstice DiskSuite

A volume manager used by the SunPlex system. See also "volume manager."

split brain

A condition in which a cluster breaks up into multiple partitions, with each partition forming without knowledge of the existence of any other.

Sun Cluster (software)

The software portion of the SunPlex system. (See SunPlex.)

SunPlex

The integrated hardware and Sun Cluster software system that is used to create highly available and scalable services.

switchback

See "failback."

switchover

The orderly transfer of a resource group or device group from one master (node) in a cluster to another master (or multiple masters, if resource groups are configured for multiple primaries). A switchover is initiated by an administrator by using the scswitch(1M) command.

System Service Processor (SSP)

In Enterprise 10000 configurations, a device, external to the cluster, used specifically to communicate with cluster members.

T

takeover

See "failover."

terminal concentrator

In non-Enterprise 10000 configurations, a device that is external to the cluster, used specifically to communicate with cluster members.

V

VERITAS Volume Manager

A volume manager used by the SunPlex system. See also "volume manager."

volume manager

A software product that provides data reliability through disk striping, concatenation, mirroring, and dynamic growth of metadevices or volumes.