|Skip Navigation Links|
|Exit Print View|
|Managing Services and Faults in Oracle Solaris 11.1 Oracle Solaris 11.1 Information Library|
After Fault Management has faulted a component in your system, you will want to repair it. A repair can happen in one of two ways: implicitly or explicitly.
An implicit repair can occur when the faulty component is replaced or removed, provided the component has serial number information that the Fault Manager daemon can track. On many SPARC based systems, serial number information is included in the FMRIs so that the Fault Manager daemon can determine when components have been removed from operation, either through replacement or other means (for example, blacklisting). When such detections occur, the Fault Manager daemon no longer displays the affected resource in fmadm faulty output. The resource is maintained in the daemon's internal resource cache until the fault event is 30 days old, at which point it is purged.
Implicit repairs do not apply to all systems. Sometimes, even though there is a chassis-id in the FMRIs, no FRU serial number information is available. So the Fault Manager daemon cannot detect a FRU replacement, requiring an explicit repair.
The fmadm command is used to explicitly mark a fault as repaired. Four syntaxes are associated with repairs for this command:
fmadm replaced fmri | label
fmadm repaired fmri | label
fmadm acquit fmri | label
fmadm acquit uuid [ fmri | label]
Although these four commands can take FMRIs and UUIDs as arguments, the preferred argument to use is the label. If a FRU has multiple faults against it, you want to replace the FRU only one time. If you issue the fmadm replaced command against the Label, the FRU is reflected as such in any outstanding cases.
You can use the fmadm replaced command to indicate that the suspect FRU has been replaced or removed.
If the system automatically discovers that a FRU has been replaced (the serial number has changed), then this discovery is treated in the same way as if fmadm replaced had been typed on the command line. The fmadm replaced command is not allowed if fmd can automatically confirm that the FRU has not been replaced (the serial number has not changed).
If the system automatically discovers that a FRU has been removed but not replaced, then the current behavior is unchanged: The suspect is displayed as not present, but is not considered to be permanently removed until the fault event is 30 days old, at which point it is purged.
You can use the fmadm repaired command when some physical repair has been carried out to resolve the problem, other than replacing a FRU. Examples of such repairs include reseating a card or straightening a bent pin.
Often you use the acquit option when you determine that the resource was not the cause. Acquittal can also happen implicitly when additional error events occur, and the diagnosis gets refined.
Replacement takes precedence over repair, and both replacement and repair take precedence over acquittal. Thus, you can acquit a component and then subsequently repair it, but you cannot acquit a component that has already been repaired.
A case is considered repaired (moves into the FMD_CASE_REPAIRED state and a list.repaired event is generated) when either its UUID is acquitted, or all suspects have been either repaired, replaced, removed, or acquitted.
Usually fmd automatically acquits a suspect in a multi-element suspect list, or Support Services gives you instructions to perform a manual acquittal. You would only want to acquit by FMRI or label if you determined that the resource was not guilty in all current cases in which it is a suspect. However, to allow a FRU to be manually acquitted in one case while remaining a suspect in all others, the following option enables you to specify both UUID and FMRI, or UUID and label:
fmadm acquit uuid [fmri|label]