The Sun Blade 8000 Series includes many blade-centric and chassis-wide features that increase reliability, availability, and serviceability (RAS). These RAS features are aspects of a system's design that affect its ability to operate continuously and to minimize the time necessary to service the system. Reliability refers to the system's ability to operate continuously without failures and to maintain data integrity. Availability refers to the ability of the system to recover to an operational state after a failure, with minimal impact. Serviceability relates to the time it takes to restore a system to service following a component failure. Together, the RAS features of the Sun Blade 8000 Series provide for near continuous operation.
This topic includes the following sections:
Hot-Pluggable Components
Redundant Components
Environmental Monitoring
Error Correction and Parity
RAS Features Summary
Sun Blade 8000 Series hardware supports hot-plugging of the chassis-mounted Sun Blade Server Modules (blades), Sun Blade 8000 Network Express Modules, PCI Express ExpressModules, Chassis Monitoring Modules, fan modules, power supply modules, and hard disk drives. Using the proper software commands, you can install or remove these components while the system is running. Hot-plug technology significantly increases the system's serviceability and availability by enabling you to replace these components without service disruption. For more information, see About Hot-Pluggable Components.
The Sun Blade 8000 Series provides redundant components that enable the system to continue operations in the event that one of the associated components fails. This separation of functions minimizes the impact of component problems and servicing. The redundant components include the following:
Server Modules (blades) depending on system configuration
Power supply modules
PCI Express ExpressModules (Sun Blade 8000 Chassis only)
Network Express Modules
Chassis Monitoring Modules
System fans
The Sun Blade 8000 Series features an environmental monitoring subsystem designed to protect components against the following:
Extreme temperatures
Lack of adequate airflow throughout the system
Power supply failures
Hardware faults
Temperature sensors located throughout the system monitor the ambient temperature of the chassis and internal components. The software and hardware ensure that the temperatures within the chassis do not exceed predetermined safe operating ranges. If the temperature observed by a sensor falls below or rises above a set threshold, the monitoring software subsystem lights the amber Service Required indicators on the front and back of the system. If the temperature condition persists and reaches a critical threshold, the system may initiate a graceful system shutdown.
All error and warning messages are sent to the Chassis Monitoring Module (CMM), and are logged in the Sun ILOM log file. Additionally, some customer-replaceable units (CRUs), such as power supplies, fans, and DIMMs, provide LEDs that indicate a failure within the CRU.
The AMD dual-core processors on the Sun Blade X8400, X8420, and X8440 Server Modules (blades) and the Intel quad-core processor on the X8450 Server Module provide parity protection on internal cache memories and error-correcting code (ECC) protection of the data. The system can detect and log to the system event log (SEL) the following types of errors:
Correctable and uncorrectable memory ECC errors
SP correctable memory ECC errors
Correctable and uncorrectable CPU internal errors
Faults in the chassis shared infrastructure, including fan and power supply faults
Advanced ECC corrects up to 4 bits in error on nibble boundaries, as long as they are all in the same DRAM. If a DRAM fails, the DIMM or FBDIMM continues to function.
Feature |
Description |
---|---|
Power supplies |
Hot-pluggable; integrated into the chassis, making the blades more reliable
|
Airflow and cooling |
Fans are integrated into the chassis, making the fans, blades, and power supplies more reliable For the Sun Blade 8000 Chassis:
For the Sun Blade 8000 P Chassis:
|
Server Modules (blades) |
Hot-pluggable; servicing can be done without affecting cabling or I/O configuration |
Memory |
ECC-protected memory and CPUs |
I/O modules |
Hot-pluggable PCI Express ExpressModules (for the Sun Blade 8000 Chassis only) and Network Express Modules |
Server Module (blade) disk drives |
Hot-pluggable; configurable in RAID-0 (striping) and RAID-1 (mirroring) configuration |
Chassis Monitoring Modules |
Hot-pluggable; active/standby operation with two CMMs installed |
Service processors |
Redundant connection to the internal management network |
Sun ILOM and system management |
Intelligent per-blade and chassis-wide management functions; Sun ILOM continues to function and be accessible when the operating system goes offline or the system is powered off; provides remote management of the blades and remote floppy and CD-ROM emulation |
Hardware upgrades |
No tools required to access user-upgradeable modules |
Software upgrades |
Network-based booting and network-based operating system and BIOS upgrades |
Power-on and restart |
Automatic server restart; network-based booting capability |
Troubleshooting |
Troubleshooting includes:
|