JavaScript is required to for searching.
Skip Navigation Links
Exit Print View
Sun Fire X4800 Server Product Documentation
search filter icon
search icon

Document Information

Using This Documentation

Hardware Installation

Operating System Installation

Administration, Diagnostics, and Service

Diagnostics Guide

Overview of the Diagnostics Guide

Introduction to System Diagnostics

Troubleshooting Options

Diagnostic Tools

Troubleshooting the Server

How to Gather Service Visit Information

How to Troubleshoot Power Problems

How to Externally Inspect the Server

How to Internally Inspect the Server

Troubleshooting DIMM Problems

DIMM Fault LEDs

About DIMM Fault LEDs

How to Read the DIMM Fault LEDs

Identifying DIMM Error Messages

How BIOS POST Memory Testing Works

How to Interpret DIMM Error Messages in the SEL

Correcting DIMM Errors

How to Isolate and Correct DIMM ECC Errors

BIOS POST

Default BIOS Power-On Self-Test (POST) Events

BIOS POST Errors

Using the ILOM to Monitor the Host

Viewing the ILOM Sensor Readings

How to Use the ILOM Web Interface to View the Sensor Readings

How to Use the ILOM Command-Line Interface to View the Sensor Readings

Viewing Fault Status

How to View Fault Status Using the ILOM Web Interface

How to View Fault Status With the Command-Line Interface

Clearing Faults

How to Clear Faults in the Web Interface

How to Clear Faults Using the Command-Line Interface

Viewing the ILOM System Event Log

How to View the System Event Log Using the ILOM Web Interface

How to View the System Event Log With the ILOM Command-Line Interface

Clearing the System Event Log

How to Clear the System Event Log Using the ILOM Web Interface

How to Clear the System Event Log Using the ILOM Command-Line Interface

Interpreting Event Log Time Stamps

Resetting the SP

How to Reset the ILOM SP Using the Web Interface

How to Reset the ILOM SP Using the Command-Line Interface

Creating a Data Collector Snapshot

How to Create a Snapshot With the ILOM Web Interface

How to Create a Snapshot With the ILOM Command-Line Interface

Using SunVTS Diagnostics Software

Introduction to SunVTS Diagnostic Test Suite

SunVTS Documentation

How to Diagnose Server Problems With the Bootable Diagnostics CD

Performing Pc-Check Diagnostic Tests

Pc-Check Diagnostics Overview

How to Run Pc-Check Diagnostics

Pc-Check Main Menu

System Information Menu

Advanced Diagnostics

Burn-In Testing

Standard Scripts

How to Perform Immediate Burn-In Testing

How to Create and Save Scripts for Deferred Burn-in Testing

Viewing the Pc-Check Results

How to View Pc-Check Files With the Text File Editor

How to View Test Results Using Show Results Summary

How to Print the Results of Diagnostics Tests

Index

Correcting DIMM Errors

The following procedure describes how to isolate and correct DIMM ECC errors.

How to Isolate and Correct DIMM ECC Errors

If the ILOM reports an ECC error or a problem with a DIMM, first complete the steps in the following procedure.


Note - This procedure cannot be completed without the assistance of Oracle service personnel. Oracle service assistance is needed to access the sunservice account and remove and install the CPU.


In this example, ILOM reports an error with the DIMM in CMOD 0, CPU0, slot 1. The fault LEDs on CPU0, slots 1 and 0, are lit.


Caution

Caution - Before handling components, attach an antistatic wrist strap to a chassis ground (any unpainted metal surface). The system’s printed circuit boards and hard disk drives contain components that are extremely sensitive to static electricity.


  1. If you have not already done so, shut down the Sun Fire X4800 system to standby power mode, remove the CMOD, and remove the cover of the CMOD.

    Refer to the Sun Fire X4800 Server Service Manual.

  2. Inspect the installed DIMMs to ensure that they comply with the DIMM population rules in the Sun Fire X4800 Server Service Manual.
  3. Press the Fault Remind button on the CMOD to light the faulty DIMM LEDs.

    If any of these LEDs are lit, they can indicate the component with the fault.

    See About DIMM Fault LEDs for the location of the Fault Remind button and DIMM fault LEDs.

  4. If any DIMM fault LEDs are lit, remove the DIMMs from the CMOD.

    Refer to the Sun Fire X4800 Server Service Manual.

  5. Visually inspect the DIMMs for physical damage, dust, or any other contamination on the connector or circuits.
  6. Visually inspect the DIMM slot for physical damage. Look for cracked or broken plastic on the slot.
  7. If there was conductive debris in the socket, clean and reseat the DIMMs, then rerun the tests that caused the original memory failure to occur to see if it reoccurs.

    If there is no obvious damage, continue with the following steps.

  8. Exchange the DIMM pairs between the two CPU slots. Ensure that they are inserted correctly with ejector latches secured. Using the slot numbers from the example:
    1. Remove the DIMMs from CPU0, slots 1 and 0.
    2. Remove the DIMMs from CPU1, slots 1 and 0.
    3. Install the DIMMs from CPU0 into slots 1 and 0 on CPU1.

      Make sure the that DIMM from CPU0 slot 0 is installed in CPU1 slot 0.

    4. Install the DIMMs from CPU1 into slots 1 and 0 on CPU0.
  9. Power on the server and run the test that reported the error again.
  10. Review the log file.
    1. Log on to ILOM using the sunservice account.

      Note - Contact Oracle service for assistance in logging into ILOM with the sunservice account.


    2. Access the persist directory (cd /persist)
      # cd /persist
    3. Search for CEs in the host_debug_err.log as shown in the following example:
      # grep CE host_debug_err.log
      
      Mon Mar 13 23:09:25 2000 ID ffff V ECC No-UE CE Node 3 Branch 1 DIMM 
      Pair(s): D10/D14 D11/D15
      Mon Mar 13 23:14:25 2000 ID ffff V ECC No-UE CE Node 3 Branch 1 DIMM 
      Pair(s): D10/D14 D11/D15
      Tue Mar 14 00:29:25 2000 ID ffff V ECC No-UE CE Node 3 Branch 1 DIMM 
      Pair(s): D10/D14 D11/D15
      Tue Mar 14 00:34:25 2000 ID ffff V ECC No-UE CE Node 3 Branch 1 DIMM 
      Pair(s): D10/D14 D11/D15
      Tue Mar 14 01:29:26 2000 ID ffff V ECC No-UE CE Node 3 Branch 1 DIMM 
      Pair(s): D10/D14 D11/D15
      Tue Mar 14 01:49:26 2000 ID ffff V ECC No-UE CE Node 3 Branch 1 DIMM 
      Pair(s): D10/D14 D11/D15
      Tue Mar 14 01:54:26 2000 ID ffff V ECC No-UE CE Node 3 Branch 1 DIMM 
      Pair(s): D10/D14 D11/D15
      Tue Mar 14 01:59:26 2000 ID ffff V ECC No-UE CE Node 3 Branch 1 DIMM 
      Pair(s): D10/D14 D11/D15
      Tue Mar 14 02:04:26 2000 ID ffff V ECC No-UE CE Node 3 Branch 1 DIMM 
      Pair(s): D10/D14 D11/D15
      Tue Mar 14 02:09:26 2000 ID ffff V ECC No-UE CE Node 3 Branch 1 DIMM 
      Pair(s): D10/D14 D11/D15
      Tue Mar 14 02:14:26 2000 ID ffff V ECC No-UE CE Node 3 Branch 1 DIMM 
      Pair(s): D10/D14 D11/D15
      Tue Mar 14 02:54:26 2000 ID ffff V ECC No-UE CE Node 3 Branch 1 DIMM 
      Pair(s): D10/D14 D11/D15
      Tue Mar 14 02:59:26 2000 ID ffff V ECC No-UE CE Node 3 Branch 1 DIMM 
      Pair(s): D10/D14 D11/D15
      Tue Mar 14 03:04:26 2000 ID ffff V ECC No-UE CE Node 3 Branch 1 DIMM 
      Pair(s): D10/D14 D11/D15
      Tue Mar 14 03:09:26 2000 ID ffff V ECC No-UE CE Node 3 Branch 1 DIMM 
      Pair(s): D10/D14 D11/D15

    Note - A BIOS SMI is set to run every 5 minutes to check for Memory CEs, so you must to wait at least 5 minutes after testing to see if a CE is detected by the BIOS. Please keep this in mind when determining whether the CEs follow the DIMMs.


    • If the error now appears in CPU1, replace the DIMMs in slots 0 and 1.

    • If the error still appears in CPU0, , the problem is not related to an individual DIMM. Instead, it might be caused by CPU0 or by the DIMM slot. Continue with the rest of the procedure.


    Note - To complete the rest of the procedure, contact Oracle service for assistance with removing and installing the CPUs. The CPUs are field replaceable units (FRUs).


  11. Shut down the server again.
  12. Remove the CMOD from the system.
  13. Remove both CPUs.
  14. Reinstall the CPUs into the opposite slot from the original.

    In other words, switch the positions of the two CPUs.

    Refer to the Sun Fire X4800 Server Service Manual.

  15. Power on the Sun Fire X4800 system.
  16. Power on the server and run the test that reported the original error again.
  17. Review the log file.

    See Step 10 for instructions.

    • If the error continues to appear in the same DIMM slot, most likely there is an issue with the DIMM slot. Return the board to the Support Center for replacement.

    • If the error follows the CPU, replace the CPU and confirm that the memory error does not return.