This section provides a detailed description of each failover detection point identified in Table 10-2:
Main SSP to Domains Failure
The main SSP detects this failure of the public network interface on the main SSP to the domains and initiates an SSP failover.
The public network interface failure is not fatal to the main SSP, but it affects dynamic reconfiguration (DR), Sun Enterprise Cluster, and Sun Management Center operations. This failure
Prevents DR operations from communicating with the DR daemons in the active domains
Restricts netcon sessions to the JTAG interface
Prevents the net booting of the SSP
Makes the CD-ROM inaccessible
Prevents the main SSP in a Sun Enterprise Cluster configuration from shutting down cluster nodes in a split-brain situation, which could allow a potential corruption of the cluster database
Prevents Sun Management Center from querying domains about their current state and configuration
The fod daemon monitors connections between the SSPs and the Sun Enterprise 10000 domains less frequently than the connections between the SSPs and and the control boards. If the main SSP cannot communicate with the domains, but the spare SSP can communicate with some or all of the domains, this failure condition must persist for 25 minutes before a failover is triggered. After 25 minutes, the fod daemon will initiate a failover, provided that the spare SSP can communicate with the primary control board and the spare SSP has sufficient memory and disk space.
Spare SSP to Domain Failure
The spare SSP detects this failure of the public network interface on the spare SSP to domains. This public interface failure does not cause a loss in critical SSP functionality, but it can affect dynamic reconfiguration, Sun Remote Services (SRS), Sun Management Center, and the Sun Cluster console.
As a result, SSP failover is disabled.
Main SSP Failure
A failure in the main SSP can be caused by the following:
The depletion of SSP resources, such as virtual memory or disk space. The main SSP detects this failure and initiates a failover.
A system crash, which is detected by the spare SSP and the control boards. The spare SSP initiates the failover.
Spare SSP Failure
Both control boards and the main SSP detect this spare SSP failure. This failure disables SSP failover.
Main SSP to Spare Hub Failure
Both SSPs detect this failure of the control board network connection from the main SSP to the spare hub and spare control board. Both SSP and control board failover are disabled.
Spare SSP to Main Hub Failure
Both SSPs and the primary control board detect this failure of the control board network connection from the spare SSP to the main hub and primary control board.
SSP failover is disabled because the spare SSP cannot monitor the SSP as required.
Main SSP to Main Hub Failure
Both SSPs and the primary control board detect this failure of the control board network connection from the main SSP to the main hub and primary control board. When connectivity from the spare SSP to the primary control board is verified, an SSP failover is attempted. If the SSP failover is unsuccessful, a control board failover occurs instead.
Spare SSP to Spare Hub Failure
Both SSPs and the spare control board detect this failure of the control board network connection from the spare SSP to the spare hub, and spare control board. SSP failover is disabled.
Main Hub Failure
Both SSPs and the primary control board detect this failure of the main hub and all connections to the primary control board. If connectivity to the domains exists and the domains are running, this failure causes a partial control board failover to the spare control board (JTAG failover only). If no domains are currently running, this failure causes a complete control board failover (JTAG and system clock failover).
If a partial control board failover occurs, note that full control board functionality is retained, even though the JTAG interface and system clock are split between the primary and spare control boards.
Spare Hub Failure
Both SSPs and the spare control board detect this failure of the spare hub and all connections to the spare control board.
Primary Control Board to Main Hub Failure
Both SSPs and the primary control board detect this failure of the control board network connection from the main hub to the primary control board. If domains are running, this failure causes a partial control board failover (JTAG only) to the spare control board. If no domains are running, this failure causes a full control board failover.
If a partial control board failover occurs, note that full control board functionality is retained, even though the JTAG interface and system clock are split between the primary and spare control boards.
Spare Control Board to Spare Hub Failure
Both SSPs and the spare control board detect this failure of the control board network connection from the spare hub to the spare control board. This failure disables the control board failover.
Primary Control Board Failure
Both SSPs detect this failure. If domains are running, this failure causes a partial control board failover (JTAG only) to the spare control board. If no domains are running, this failure causes a full control board failover.
If a partial control board failover occurs, note that full control board functionality is retained, even though the JTAG interface and system clock are split between the primary and spare control boards.
Spare Control Board Failure
Both SSPs detect this failure, which disables a control board failover.