fmadm - fault management configuration tool
fmadm [-q] [subcommand [arguments]]
The fmadm utility can be used by administrators and service personnel to view and modify system configuration parameters maintained by the Solaris Fault Manager, fmd(8). fmd receives symptomatic telemetry associated with conditions detected by the system software, diagnoses the telemetry into faults, defects, or alerts, and initiates proactive self-healing activities such as disabling faulty components.
fmadm can be used to do the following:
View the set of diagnosis engines and agents that are currently participating in fault management.
View the list of system components that have been diagnosed as associated with a fault, defect, or alert.
Perform administrative tasks related to these entities.
The Fault Manager attempts to automate as many activities as possible, so use of fmadm is typically not required. When the Fault Manager needs help from an administrator, service repair technician, or Oracle, it produces a message indicating its needs. It also refers you to a knowledge article on the Oracle web site. The web site might ask you to use fmadm or one of the other fault management utilities to gather more information or perform additional tasks. The documentation for fmd(8), fmdump(8), and fmstat(8) man pages and the Securing Systems and Attached Devices in Oracle Solaris 11.4 guide describe more about tools to observe fault management activities.
One responsibility of the Fault Manager is to keep track of the location of components. At the chassis level, the fmadm *-alias subcommands manage a chassis chassis-name.chassis-serial to alias-id mapping. The administered alias-id is intended to describe the physical location of a chassis.
The fmadm utility requires the user to be assigned the solaris.fm.read RBAC authorization ("Fault Management" or "Fault Information" RBAC profile) for read operations, or the solaris.fm.modify RBAC authorization ("Fault Management" RBAC profile) for modify operations. The fmadm load subcommand requires that the user possess all privileges.
The fmadm command accepts the following subcommands. Some of the subcommands accept or require additional options and operands. The acquit, load, unload, repaired, replaced, reset, and rotate subcommands are intended for trained technical personnel. Use of these subcommands without the specific guidance of, for example, a Knowledge Base article is not recommended.
Notify the Fault Manager that the specified resource is not to be considered to be a suspect in the event identified by uuid, or if no UUID is specified, then in any faults, defects, or alerts that have been detected. The fmadm acquit command should be used only at the direction of a documented Oracle repair procedure. Administrators might need to apply additional commands to re-enable a previously faulted resource.
Notify the Fault Manager that the event identified by uuid can be safely ignored. The fmadm acquit command should be used only at the direction of a documented Oracle repair procedure. Administrators might need to apply additional commands to re-enable any previously faulted resources.
The add-alias subcommand is used to establish alias-id as a managed alias for the chassis-name.chassis-serial chassis. When a managed alias is defined, the /dev/chassis devchassis(4FS) name space representation of the chassis will use the more meaningful alias-id instead of the chassis-name.chassis-serial.
# fmadm add-alias SUN-Storage-J4410.1039QAQ007 RACK29.U25-28
The command shown above will verify that the new mapping does not conflict with existing mappings. In the case of conflict, no mapping change occurs. This command completes when the associated name space updates are complete. If the updated name space does not use the new alias-id, a warning is printed, but the mapping is updated. If the name space update takes too long, a warning is printed.
The add-alias will now also accept "SYS" as an identifier for the main chassis, instead of requiring the chassis-name.chassis-serial.
If an optional comment is provided, the comment is preserved and will be displayed by a subsequent lookup-alias or list-alias command. See also remove-alias and sync-alias.
Notify the Fault Manager that any alert events associated with the specified location label or uuid or identified by class@resource should be cleared. This command can only be applied to an alert, not to a defect or fault.
Display the configuration of the Fault Manager itself, including the module name, version, and description of each component module. Fault Manager modules provide services such as automated diagnosis, self-healing, and messaging for hardware and software present on the system.
This command is an alias for the fmadm list command.
Flush the information cached by the Fault Manager for the specified resource, for any faults, defects, or alerts for which the resource has already been repaired, acquitted or replaced.
Display status information for resources that the Fault Manager currently believes to be associated with a fault, defect, or alert. See also list-alert, list-defect, and list-fault subcommands.
The following options are supported:
Display all resources that might be associated with a fault, defect, or alert. By default, the fmadm list command only lists output for resources for which a fault, defect, or alert is still active. If you specify the –a option, all resource information cached by the Fault Manager is listed, including information for resources that might have already been acquitted or repaired, or might no longer be present in the system.
Display FRUs (Field Replaceable Units) that are associated with a fault, defect, or alert.
Pipe output through a pager with form feed between each event.
Display affected Solaris resources with their Identifier (FMRI) and their fault management state.
Display a one-line line summary for each event.
Only display the event with the given uuid.
Display full output.
The percentage certainty is displayed if an event has multiple suspects, either of different classes or on different FRUs. If more than one resource is on the same FRU and it is not 100% certain that the event is associated with the FRU, the maximum percentage certainty of the possible suspects on the FRU is displayed.
The Fault Manager associates the following states with every FRU which that has been diagnosed as having a fault, defect, or alert.
The FRU has been diagnosed by the Fault Manager as being associated with a fault, defect, or alert, and is still present in the system.
The FRU has been diagnosed by the Fault Manager as being associated with a fault, defect, or alert and is no longer present in the system.
The FRU has been removed from the system and a replacement FRU has taken its place.
The command fmadm repaired has been used to notify the Fault Manager that the FRU has been repaired.
The command fmadm acquit has been used to notify the Fault Manager that the FRU has been acquitted.
The state of any affected Solaris resources is also displayed. For a faulty FRU, the state of the associated resources can be one of:
Faulted and taken out of service
Faulted but still in service
Faulted but still providing degraded service
For a FRU that has been repaired, replaced or acquitted, the state of the associated resources can be one of:
Okay and in service
Out of service, but associated components no longer faulty
Service degraded, but associated components no longer faulty
This subcommand behaves like list, but only displays suspect lists that contain an alert event.
The list-alias command is used to display all comments and mappings. See also add-alias, lookup-alias, remove-alias, and sync-alias subcommands.
This subcommand behaves like list, but only displays suspect lists that contain a defect event.
This subcommand behaves like list, but only displays suspect lists that contain a fault event.
Load the specified Fault Manager module. The path must be an absolute path and must refer to a module present in one of the defined directories for modules. Typically, the use of this command is not necessary because the Fault Manager loads modules automatically when Solaris initially boots or as needed. See also fmadm unload.
The lookup-alias subcommand can be used to determine what the current mapping is. The following is an example command.
# fmadm lookup-alias SUN-Storage-J4410.1039QAQ007
See also add-alias, list-alias, remove-alias, and sync-alias.
The fmadm remove-alias subcommand is used to remove a chassis-name.chassis-serial to alias-id mapping.
# fmadm remove-alias RACK29.U25-28
The command shown above completes when the associated name space updates are complete. See also add-alias, list-alias, lookup-alias, and sync-alias.
Notify the Fault Manager that a repair procedure has been carried out on the specified resource. The fmadm repaired command should be used only at the direction of a documented Oracle repair procedure. Administrators might need to apply additional commands to re-enable a previously faulted resource.
Notify the Fault Manager that the specified resource has been replaced. This command should be used in those cases in which the Fault Manager is unable to automatically detect the replacement. The fmadm replaced command should be used only at the direction of a documented Oracle repair procedure. Administrators might need to apply additional commands to re-enable a previously faulted resource.
Reset the specified Fault Manager module or module subcomponent. If the –s option is present, the specified Soft Error Rate Discrimination (SERD) engine is reset within the module. If the –s option is not present, the entire module is reset and all persistent state associated with the module is deleted. The fmadm reset command should only be used at the direction of a documented Oracle repair procedure. The use of this command is typically not necessary because the Fault Manager manages its modules automatically.
The fmadm rotate command is a helper command for the logadm(8) command, so that logadm can rotate live log files correctly. The fmadm rotate command is not intended to be invoked directly (and invoking it directly is likely to lose log history). Use one of the following commands to cause the appropriate logfile to be rotated, if the current logfile is not zero in size:
# logadm -p now -s 1b /var/fm/fmd/errlog # logadm -p now -s 1b /var/fm/fmd/fltlog # logadm -p now -s 1b /var/fm/fmd/infolog # logadm -p now -s 1b /var/fm/fmd/infolog_hival
The sync-alias subcommand is used to hand-import a set of mappings in bulk. Two copies of the current mappings are maintained:
To import a set of mappings in bulk, you can update the /etc/dev/chassis_aliases file and then run fmadm sync-alias. See also add-alias, list-alias, lookup-alias, and remove-alias.
Unload the specified Fault Manager module. Specify module using the basename listed in the fmadm config output. Typically, the use of this command is not necessary because the Fault Manager loads and unloads modules automatically based on the system configuration. See also fmadm load.
The following options are supported:
Set quiet mode. fmadm does not produce messages indicating the result of successful operations to standard output.
The following operands are supported:
The name of a subcommand listed in SUBCOMMANDS.
One or more options or arguments appropriate for the selected subcommand, as described in SUBCOMMANDS. Among these arguments are fmri, uuid, and label. These identify resources that are the objects of fmadm subcommands. Use fmadm list to obtain the fmri, uuid, and label for a targeted resource. In general, label is the most user-friendly of these operands. See the Examples section below.
The following exit values are returned:
An error occurred. Errors include a failure to communicate with fmd or insufficient RBAC authorization to perform the requested operation
Invalid command-line options were specified
The following command invokes the list subcommand, which displays the uuid, label, and fmri for a component.
# fmadm list ------------ ------------------------------------ ------------ --------- TIME EVENT-ID MSG-ID SEVERITY ------------ ------------------------------------ ------------ --------- Sep 09 16:15 96609fae-113c-e48c-b1cf-ebf4b0902d72 DISK-8000-3E Critical Problem Status : open Diag Engine : eft / 1.16 System Manufacturer : Oracle-Corp. Name : SUN-FIRE-X4170-SERVER Part Number : unknown Serial Number: 0920XF508B ---------------------------------------- Suspect 1 of 1: Fault class: fault.io.scsi.cmd.disk.dev.rqs.derr Certainty : 100% Affects : dev:///:devid=id1,sd@n5000c5000940edbb//scsi_vhci/disk@g\ 5000c5000940edbb Status : faulted and taken out of service FRU Status : faulty Location : "/SUN-Storage-J4410.1037QAQ052/HDD11" Location Alias : "/RACK29.U25-28/HDD11" Manufacturer : SEAGATE Name : ST330057SSUN300G Part Number : SEAGATE-ST330057SSUN300G Revision : 0205 Serial Number : 000930G01CN4----3SJ01CN4 Chassis Manufacturer : Oracle-Corp. Name : SUN-Storage-J4410 Part Number : 594-5329 Serial Number : 1037QAQ052 ... ...
In the preceding output, the uuid is the first item in the EVENT-ID column, 96609fae-113c-e48c-b1cf-ebf4b0902d72. The label is in the FRU section in the Location line, "/SUN-Storage-J4410.1037QAQ052/HDD11". In this example, an alias for the chassis has been set, and the aliased location is displayed in the Location Alias line, "/RACK29.U25-28/HDD11".
The fmris are available with fmdump –v:
# fmdump -v Sep 09 16:15:36.9252 96609fae-113c-e48c-b1cf-ebf4b0902d72 DISK-8000-3E \ Diagnosed 100% fault.io.scsi.cmd.disk.dev.rqs.derr Problem in: hc://:scheme=:chassis-mfg=Oracle-Corp.:chassis-name=SUN-\ Storage-J4410:chassis-part=594-5329:chassis-serial=1037QAQ052/ses-\ enclosure=0/bay=11/disk=0 Affects: dev:///:devid=id1,sd@n5000c5000940edbb//\ scsi_vhci/disk@g5000c5000940edbb FRU: hc://chassis-mfg=Oracle-Corp.:chassis-name=SUN-Storage-J4410\ :chassis-part=594-5329:chassis-serial=1037QAQ052:fru-mfg=SEAGATE\ :fru-name=SEAGATE-ST330057SSUN300G:fru-part=SEAGATE-ST330057SSUN300G\ :fru-revision=0205:fru-serial=000930G01CN4--------3SJ01CN4/\ ses-enclosure=0/bay=11/disk=0 FRU Location: "/SUN-Storage-J4410.1037QAQ052/HDD11"
Note that label is the easiest-to-use identifier. Either the aliased or the non-aliased form of the Location may be used.Example 2 Obtaining the Module Name
The following command displays the module name for each component. The module name is specified as input to the fmadm unload command.
# fmadm config MODULE VERSION STATUS DESCRIPTION cpumem-retire 1.1 active CPU/Memory Retire Agent disk-transport 1.0 active Disk Transport Agent eft 1.16 active eft diagnosis engine ..
See attributes(7) for descriptions of the following attributes:
The command-line options are Committed. The human-readable output is not-an-interface.
Oracle Solaris FMA does not determine or is not actively involved in SMART failure analysis or predictions. It reads the SMART data reported by the disk.
A third party utility such as SMARTCTL can be used to view more SMART information, which is read-only. The thresholds are determined by the disk manufacturer and can vary from one disk make/model/firmware level to other. HDD manufacturers do not make SMART threshold information available. However, SMARTCTL may be able to show you these values, but it is read-only.
Once a disk asserts a SMART failure prediction or warning, it must be replaced. You cannot turn off SMART failure prediction. Note that it is a matter of time before the disk fails completely.
While it is possible to tell Oracle Solaris FMA to ignore a specific SMART failure event by using the fmadm acquit command, it is not recommended. Again, once a disk asserts SMART failure, it cannot be changed and must be replaced.
It is recommended to ensure that your system software and firmware (including SAS controller and disk firmware) are all kept up to date. This ensures the system has the best capabilities.