Repairing Faults and Defects
After Fault Management has identified a faulted component in your system, you should repair it. A repair can happen in one of two ways: implicitly or explicitly.
-
Implicit repair – An implicit repair can occur when the faulty component is replaced or removed, provided the component has serial number information that the Fault Manager daemon can track. The system's serial number information is included so that the Fault Manager daemon can determine when components have been removed from operation, either through replacement or other means. When such detections occur, the Fault Manager daemon no longer displays the affected resource in
fmadm listoutput. The resource is maintained in the daemon's internal resource cache until the fault event is 30 days old, at which point it is purged. -
Explicit repair – An explicit repair is required if no FRU serial number is available. For example, CPUs have no serial numbers. In these cases, the Fault Manager daemon cannot detect a FRU replacement.
Use the
fmadmcommand to explicitly mark a fault as repaired. The options include:-
fmadm replacedlabel -
fmadm repairedlabel -
fmadm acquitlabel[uuid] -
fmadm acquituuid
Although these four commands can take UUIDs or labels as arguments, it is better to use the label. For example, the label
/SYS/MB/P0represents the CPU labeled "P0" on the motherboard.If a FRU has multiple faults against it and you want to replace the FRU only one time, use the
fmadm replacedcommand against the FRU. -