Repairing Faults and Defects

After Fault Management has identified a faulted component in your system, you should repair it. A repair can happen in one of two ways: implicitly or explicitly.

  • Implicit repair – An implicit repair can occur when the faulty component is replaced or removed, provided the component has serial number information that the Fault Manager daemon can track. The system's serial number information is included so that the Fault Manager daemon can determine when components have been removed from operation, either through replacement or other means. When such detections occur, the Fault Manager daemon no longer displays the affected resource in fmadm list output. The resource is maintained in the daemon's internal resource cache until the fault event is 30 days old, at which point it is purged.

  • Explicit repair – An explicit repair is required if no FRU serial number is available. For example, CPUs have no serial numbers. In these cases, the Fault Manager daemon cannot detect a FRU replacement.

    Use the fmadm command to explicitly mark a fault as repaired. The options include:

    • fmadm replaced label

    • fmadm repaired label

    • fmadm acquit label [uuid]

    • fmadm acquit uuid

    Although these four commands can take UUIDs or labels as arguments, it is better to use the label. For example, the label /SYS/MB/P0 represents the CPU labeled "P0" on the motherboard.

    If a FRU has multiple faults against it and you want to replace the FRU only one time, use the fmadm replaced command against the FRU.