Administration, Diagnostics, and Service
Overview of the Diagnostics Guide
Introduction to System Diagnostics
How to Gather Service Visit Information
How to Troubleshoot Power Problems
How to Externally Inspect the Server
How to Internally Inspect the Server
How to Read the DIMM Fault LEDs
Identifying DIMM Error Messages
How BIOS POST Memory Testing Works
How to Interpret DIMM Error Messages in the SEL
Default BIOS Power-On Self-Test (POST) Events
Using the ILOM to Monitor the Host
Viewing the ILOM Sensor Readings
How to Use the ILOM Web Interface to View the Sensor Readings
How to Use the ILOM Command-Line Interface to View the Sensor Readings
How to View Fault Status Using the ILOM Web Interface
How to View Fault Status With the Command-Line Interface
How to Clear Faults in the Web Interface
How to Clear Faults Using the Command-Line Interface
Viewing the ILOM System Event Log
How to View the System Event Log Using the ILOM Web Interface
How to View the System Event Log With the ILOM Command-Line Interface
How to Clear the System Event Log Using the ILOM Web Interface
How to Clear the System Event Log Using the ILOM Command-Line Interface
Interpreting Event Log Time Stamps
How to Reset the ILOM SP Using the Web Interface
How to Reset the ILOM SP Using the Command-Line Interface
Creating a Data Collector Snapshot
How to Create a Snapshot With the ILOM Web Interface
How to Create a Snapshot With the ILOM Command-Line Interface
Using SunVTS Diagnostics Software
Introduction to SunVTS Diagnostic Test Suite
How to Diagnose Server Problems With the Bootable Diagnostics CD
Performing Pc-Check Diagnostic Tests
How to Run Pc-Check Diagnostics
How to Perform Immediate Burn-In Testing
How to Create and Save Scripts for Deferred Burn-in Testing
How to View Pc-Check Files With the Text File Editor
How to View Test Results Using Show Results Summary
The following procedure describes how to isolate and correct DIMM ECC errors.
If the ILOM reports an ECC error or a problem with a DIMM, first complete the steps in the following procedure.
Note - This procedure cannot be completed without the assistance of Oracle service personnel. Oracle service assistance is needed to access the sunservice account and remove and install the CPU.
In this example, ILOM reports an error with the DIMM in CMOD 0, CPU0, slot 1. The fault LEDs on CPU0, slots 1 and 0, are lit.
Caution - Before handling components, attach an antistatic wrist strap to a chassis ground (any unpainted metal surface). The system’s printed circuit boards and hard disk drives contain components that are extremely sensitive to static electricity. |
Refer to the Sun Fire X4800 Server Service Manual.
If any of these LEDs are lit, they can indicate the component with the fault.
See About DIMM Fault LEDs for the location of the Fault Remind button and DIMM fault LEDs.
Refer to the Sun Fire X4800 Server Service Manual.
If there is no obvious damage, continue with the following steps.
Make sure the that DIMM from CPU0 slot 0 is installed in CPU1 slot 0.
Note - Contact Oracle service for assistance in logging into ILOM with the sunservice account.
# cd /persist
# grep CE host_debug_err.log Mon Mar 13 23:09:25 2000 ID ffff V ECC No-UE CE Node 3 Branch 1 DIMM Pair(s): D10/D14 D11/D15 Mon Mar 13 23:14:25 2000 ID ffff V ECC No-UE CE Node 3 Branch 1 DIMM Pair(s): D10/D14 D11/D15 Tue Mar 14 00:29:25 2000 ID ffff V ECC No-UE CE Node 3 Branch 1 DIMM Pair(s): D10/D14 D11/D15 Tue Mar 14 00:34:25 2000 ID ffff V ECC No-UE CE Node 3 Branch 1 DIMM Pair(s): D10/D14 D11/D15 Tue Mar 14 01:29:26 2000 ID ffff V ECC No-UE CE Node 3 Branch 1 DIMM Pair(s): D10/D14 D11/D15 Tue Mar 14 01:49:26 2000 ID ffff V ECC No-UE CE Node 3 Branch 1 DIMM Pair(s): D10/D14 D11/D15 Tue Mar 14 01:54:26 2000 ID ffff V ECC No-UE CE Node 3 Branch 1 DIMM Pair(s): D10/D14 D11/D15 Tue Mar 14 01:59:26 2000 ID ffff V ECC No-UE CE Node 3 Branch 1 DIMM Pair(s): D10/D14 D11/D15 Tue Mar 14 02:04:26 2000 ID ffff V ECC No-UE CE Node 3 Branch 1 DIMM Pair(s): D10/D14 D11/D15 Tue Mar 14 02:09:26 2000 ID ffff V ECC No-UE CE Node 3 Branch 1 DIMM Pair(s): D10/D14 D11/D15 Tue Mar 14 02:14:26 2000 ID ffff V ECC No-UE CE Node 3 Branch 1 DIMM Pair(s): D10/D14 D11/D15 Tue Mar 14 02:54:26 2000 ID ffff V ECC No-UE CE Node 3 Branch 1 DIMM Pair(s): D10/D14 D11/D15 Tue Mar 14 02:59:26 2000 ID ffff V ECC No-UE CE Node 3 Branch 1 DIMM Pair(s): D10/D14 D11/D15 Tue Mar 14 03:04:26 2000 ID ffff V ECC No-UE CE Node 3 Branch 1 DIMM Pair(s): D10/D14 D11/D15 Tue Mar 14 03:09:26 2000 ID ffff V ECC No-UE CE Node 3 Branch 1 DIMM Pair(s): D10/D14 D11/D15
Note - A BIOS SMI is set to run every 5 minutes to check for Memory CEs, so you must to wait at least 5 minutes after testing to see if a CE is detected by the BIOS. Please keep this in mind when determining whether the CEs follow the DIMMs.
If the error now appears in CPU1, replace the DIMMs in slots 0 and 1.
If the error still appears in CPU0, , the problem is not related to an individual DIMM. Instead, it might be caused by CPU0 or by the DIMM slot. Continue with the rest of the procedure.
Note - To complete the rest of the procedure, contact Oracle service for assistance with removing and installing the CPUs. The CPUs are field replaceable units (FRUs).
In other words, switch the positions of the two CPUs.
Refer to the Sun Fire X4800 Server Service Manual.
See Step 10 for instructions.
If the error continues to appear in the same DIMM slot, most likely there is an issue with the DIMM slot. Return the board to the Support Center for replacement.
If the error follows the CPU, replace the CPU and confirm that the memory error does not return.