Sun Fire B1600 Blade System Chassis Software Setup Guide
|
|
The Active and Standby System Controllers
|
This appendix provides a detailed explanation of the relationship between the chassis's active and standby System Controllers (if two SSCs are installed). It also describes the limitations of this relationship.
E.1 The Events That Cause a Failover
The blade system chassis contains two System Controllers. Only one of these is active at a given time, therefore only one can be accessed by means of the ALOM command-line interface. However, even though the other System Controller is quiesced (in other words, is in standby mode) its associated switch remains active, and the standby System Controller is also able to take over as the active System Controller in the event of:
- the removal of the currently active System Controller,
- a major failure of the System Controller software application on the active System Controller, or a hardware fatal error,
- an execution of the setfailover command by the user to force the System Controllers to swap roles.
E.2 The Activities of the Standby System Controller
The standby System Controller performs the following activities despite its main software application being in a quiesced state:
- Monitors the health of the currently active System Controller and takes over if that System Controller is physically removed, if a major failure of its main software application occurs, if a hardware fatal error occurs, or in response to the use of the setfailover command on the active System Controller.
- Receives the configuration parameters that the user enters for the setupsc command on the active System Controller. (This enables it to take over transparently as the active System Controller.)
- Receives all event messages so that event logs on the standby System Controller are always up to date.
- Permits console access from the active System Controller to the switch in the SSC module containing the standby System Controller. (Note that, if the booting of the standby System Controller is interrupted for any reason, the standby System Controller cannot provide console access to its associated switch).
- Helps maintain the integrity of the user login and host ID information for the chassis as a whole. (The host ID information is required for the server blades; the user login information is required for the System Controllers.) These two sets of information are stored mainly on the midplane. However, the two System Controllers are involved in their preservation.
In the case where a new SSC (in its factory default state) is introduced into a chassis that is already in use, the new SSC simply inherits the user login and host ID information that is currently stored on the midplane.
In the reverse case, where the chassis is new (and its user login and host ID information are therefore unconfigured) but the SSC has been previously in use, the midplane takes the user login and host ID information from the System Controller.
However, in the case where an SSC is introduced into a chassis and both already contain user login and host ID information but the SSC and chassis differ in respect of either or both the outcome is more complicated to predict. In this case the standby System Controller, if it is available, plays an arbitrating role. It compares its own user login and host ID information with the information held on the SSC containing the active System Controller and with the information held on the midplane. If its own host ID information agrees with that stored on either the active SSC or the midplane, then that information prevails. Similarly if its own user login information agrees with that stored on either the active SSC or the midplane, then that information prevails. For each set of information, if the standby System Controller finds that its own data differs from that of both the active SSC and the midplane, then the data in the midplane prevails.
E.3 Limitations of the Failover Relationship Between the Two System Controllers
There is no impact on the running of the server blades or switches during the failover process. However, you need to be aware that:
- When one System Controller takes over from the other the chassis is temporarily (for approximately 15 seconds) without an active System Controller. (This is because both System Controllers are reset as part of the failover process.) In consequence there will be no console logs gathered for the period during the failover, and when you log into the new active System Controller all the event logs on both System Controllers will be empty.
- During the failover process, no user management of any of the chassis's components is possible via the System Controllers. It is, however, still possible to telnet into the switches or blades, and it is still possible to use the switch's web-based graphical user interface.
- During the failover process, it is not possible to perform any upgrades of the firmware on the components of the chassis.
- To upgrade System Controller firmware you must make the System Controller that you want to upgrade into the standby one (if it is not currently the standby one). To do this use the setfailover command at the sc> prompt on the currently active System Controller.
- There is no access permitted via telnet to the standby System Controller. Use the alias IP address instead. However, you need to be aware that telnet connections are dropped when failover takes place from one System Controller to another.
Sun Fire B1600 Blade System Chassis Software Setup Guide
|
817-4603-11
|
|
Copyright © 2004, Sun Microsystems, Inc. All rights reserved.