Memory Fault Handling Overview - SPARC Enterprise T5120 and T5220 Servers Topic Set

Skip Navigation Links
Exit Print View
	SPARC Enterprise T5120 and T5220 Servers Topic Set

Oracle Technology Network

Document Information

Using This Documentation

Related Documentation

Documentation, Support, and Training

Documentation Feedback

Sun SPARC Enterprise T5120 and T5220 Servers Site Planning Guide

Physical Specifications

Minimum Clearance for Service Access

Environmental Specifications

Power Source Requirements

Acoustic Noise Emissions

Agency Compliance Specifications

Operating Environment Requirements

SPARC: Electrical Power

Ambient Temperature

Ambient Relative Humidity

Airflow Considerations

Preparing for Installation

Server Overview

Server Handling Precautions

Input Power Information and Precautions

Tools and Equipment Needed

Optional Component Installation

ESD Precautions

Installation Overview

Preparing for Installation

Installing the Hardware

Configuring the Service Processor

Configuring the Host Software

Cabling Notes for Both Servers

Port, Connector, and LED Locations for Both Servers

Slide Rail Assembly Notes for Both Servers

Cable Management Notes for Both Servers

Installing the SPARC Enterprise T5120 and T5220 Servers

Installing the Servers in a Rack

Installing the Cable Management Arm for Both Servers

Connecting the Server Cables for Both Servers

Managing Cables With the CMA

Dismounting the Servers

Powering On the System

Powering On the System for the First Time

Enabling the Service Processor Network Management Port

Logging Into the Service Processor

Using the Service Processor for Common Operations

Power On the System

Connect to the System Console

Perform a Normal System Initialization

Devices in the OpenBoot Device Tree

Boot the Solaris Operating System

Avoid Booting the Solaris Operating System at Startup

Reset the System

Power Cycle the System

Verifying System Functionality

Updating the Firmware

flashupdate command

Update the Firmware

Selecting a Boot Device

Selecting a Boot Device

Installing the Servers With the Express Rail Rackmounting Kit

Slide Rail Assembly Notes for the Express Rail Rackmounting Kit

Installing the Servers in a Rack With Express Rails

Installing the Cable Management Arm

Dismounting the Server

Assembling and Installing DC Power Cables for the Sun SPARC Enterprise T5120 Server

Requirements for Servers With DC Input Power

DC Supply and Ground Conductor Requirements

Overcurrent Protection Requirements

Assembling and Installing the DC Input Power Cables

Connecting the DC Input Power Cords to the Server

Assembling and Installing DC Power Cables for the Sun SPARC Enterprise T5220 Server

Requirements for Servers With DC Input Power

Assembling and Installing the DC Input Power Cables

Communicating With the System

Log In to the System Console

Display the ok Prompt

Display the ILOM ->Prompt

Use a Local Graphics Monitor

Performing Common Tasks

Power On the System

Power Off the System

Reset the System

Update the Firmware

Hardware RAID Support

Creating Hardware RAID Volumes

Delete a Hardware RAID Volume

Hot-Plug a Mirrored Disk

Hot-Plug a Nonmirrored Disk

Disk Slot Numbers

Managing Devices

Unconfigure a Device Manually

Reconfigure a Device Manually

Devices and Device Identifiers

Sun SPARC Enterprise T5x20 Device Tree

Multipathing Software

Handling Faults

Discovering Faults

Bypassing Minor Faults

Managing Logical Domains Software

Logical Domains Software Overview

Logical Domain Configurations

View OpenBoot Configuration Variables

OpenBoot Configuration Variables on the SCC

Remote Management (ILOM 3.0)

Understanding ILOM 3.0 for the Sun SPARC Enterprise T5120 and T5220 Servers

Platform-Specific ILOM Features

ILOM Features Not Supported

Managing the Host

Managing Host Boot Mode

Viewing and Configuring Host Control Information

Managing System User Interactions

Managing the Service Processor

Storing Customer Information

Display Console History (CLI)

Change Console Escape Characters (CLI)

Changing Configuration Policy Settings

Managing Network Access

ILOM Information Stored on the SCC

Managing Devices

Managing Virtual Keyswitch Settings

Discover IPMI Sensors and Indicators

Sensors on Sun SPARC Enterprise T5120 and T5220 Servers

Indicators on Oracle's Sun SPARC Enterprise T5120 and T5220 Servers

Discover ALOM Compatibility Information

ALOM CMT Compatibility Shell

ALOM CMT Variable Comparison

Event Messages Available Through the ALOM Compatibility Shell

Identifying Server Components

SPARC: Infrastructure Boards in Sun SPARC Enterprise T5120 Servers

SPARC: Infrastructure Boards in Sun SPARC Enterprise T5220 Servers

Internal System Cables for Sun SPARC Enterprise T5120 Servers

Internal System Cables for Sun SPARC Enterprise T5220 Servers

Front Panel Controls and Indicators on Sun SPARC Enterprise T5120 Servers

Rear Panel Components and Indicators on Sun SPARC Enterprise T5120 Servers

Front Panel Controls and Indicators on Sun SPARC Enterprise T5220 Servers

Rear Panel Components and Indicators on Sun SPARC Enterprise T5220 Servers

Status LEDs for Ethernet Ports and Network Management Port

Detecting and Managing Faults

Diagnostic Tools Overview

Diagnostics Tools Quick Reference

Detecting Faults With ILOM

Detecting Faults With POST

Managing Faults Using the PSH Feature

Viewing Solaris OS Messages

Managing Components With Automatic System Recovery Commands

Detecting Faults Using SunVTS Software

Preparing to Service the System

General Safety Information

Essential Tools

Find the Chassis Serial Number

Removing Power From the System

Positioning the System for Servicing

Accessing Internal Components

Servicing Hard Drives

Hard Drive Servicing Overview

Hard Drive LEDs

Remove a Hard Drive

Install a Hard Drive

Four-Drive Capable Backplane Configuration Reference

Eight-Drive Capable Backplane Configuration Reference

Sixteen-Drive Capable Backplane Configuration Reference

Servicing Motherboard Components

Servicing FB-DIMMs

Memory Fault Handling Overview

Identify Faulty FB-DIMMs Using the show faulty Command

Identify Faulty FB-DIMMs Using the FB-DIMM Fault Locator Button

Remove FB-DIMMs

Install Replacement FB-DIMMs

Verify Successful Replacement of Faulty FB-DIMMs

Upgrade Memory Configuration With Additional FB-DIMMs

FB-DIMM Configuration Guidelines

FB-DIMM Configuration Reference

Servicing the Air Duct

Remove the Air Duct

Install the Air Duct

Servicing PCIe/XAUI Risers

PCIe/XAUI Riser Overview

Remove a PCIe/XAUI Riser

Install a PCIe/XAUI Riser

Remove a PCIe or XAUI Card

Install a PCIe or XAUI Card

PCIe/XAUI Card Configuration Reference for Sun SPARC Enterprise T5120 Servers

PCIe and XAUI Card Reference for Sun SPARC Enterprise T5220 Servers

Servicing the Battery

System Battery Overview

Remove a Battery

Install a Battery

Servicing the SCC Module

SCC Module Overview

Remove a Faulty SCC Module

Install a New SCC Module

Servicing the Motherboard Assembly

Motherboard Servicing Overview

Remove the Motherboard Assembly

Install the Motherboard Assembly

Servicing Fan Modules

Fan Module Overview

Remove a Fan Module

Install a Fan Module

Servicing Power Supplies

Power Supplies Overview

Remove a Power Supply

Install a Power Supply

Power Supply Configuration Reference

Servicing Boards and Components

Important Safety Instructions

Servicing DVD/USB Modules

Servicing Fan Power Boards

Servicing the Hard Drive Cage

Servicing the Hard Drive Backplane

Servicing Front Control Panel Light Pipe Assemblies

Servicing Power Distribution Boards

Servicing Power Supply Backplanes (Sun SPARC Enterprise T5220 Servers)

Servicing Paddle Cards

Returning the Server to Operation

Install the Top Cover

Reinstall the Server in the Rack

Return the Server to the Normal Rack Position

Connect Power Cords to the Server

Power On the Server Using the poweron Command

Power On the Server Using the Front Panel Power Button

Identifying FRUs in SPARC Enterprise T5120 Servers

Motherboard Components in T5120 Servers

I/O Components in SPARC Enterprise T5120 Servers

Power Distribution/Fan Module Components in SPARC Enterprise T5120 Servers

Internal Cables for Onboard SAS Controller Cards in SPARC Enterprise T5120 Servers

HDD Data Cable Routing for SAS RAID Controller Cards in Four-Disk Capable SPARC Enterprise T5120 Servers

HDD Data Cable Routing for SAS RAID Controller Cards in Eight-Disk Capable SPARC Enterprise T5120 Servers

Identifying FRUs in Sun SPARC Enterprise T5220 Servers

Motherboard Components in T5220 Servers

I/O Components in Sun SPARC Enterprise T5220 Servers

Power Distribution/Fan Module Components in Sun SPARC Enterprise T5220 Servers

Internal Cables for Onboard SAS Controller Cards in Sun SPARC Enterprise T5220 Servers

HDD Data Cable Routing for SAS RAID Controller Cards in Sun SPARC Enterprise T5220 Servers

Memory Fault Handling Overview

A variety of features play a role in how the memory subsystem is configured and how memory faults are handled. Understanding the underlying features helps you identify and repair memory problems.

The following server features manage memory faults:

POST – By default, POST runs when the server is powered on.

For correctable memory errors (CEs), POST forwards the error to the Solaris Predictive Self-Healing (PSH) daemon for error handling. If an uncorrectable memory fault is detected, POST displays the fault with the device name of the faulty FB-DIMMs, and logs the fault. POST then disables the faulty FB-DIMMs. Depending on the memory configuration and the location of the faulty FB-DIMM, POST disables half of physical memory in the system, or half the physical memory and half the processor threads. When this offlining process occurs in normal operation, you must replace the faulty FB-DIMMs based on the fault message and enable the disabled FB-DIMMs with the ILOM command set device component_state=enabled where device is the name of the FB-DIMM being enabled (for example, set /SYS/MB/CMP0/BR0/CH0/D0 component_state=enabled).
Solaris Predictive Self-Healing (PSH) technology – PSH uses the Fault Manager daemon (fmd) to watch for various kinds of faults. When a fault occurs, the fault is assigned a unique fault ID (UUID), and logged. PSH reports the fault and suggests a replacement for the FB-DIMMs associated with the fault.

If you suspect the server has a memory problem, run the ILOM show faulty command. This command lists memory faults and identifies the FB-DIMM modules associated with the fault.

Related Information

Copyright © 2009, 2011, Oracle and/or its affiliates. All rights reserved. Legal Notices

Previous

Next