Go to main content

man pages section 8: System Administration Commands

Exit Print View

Updated: Thursday, June 13, 2019
 
 

fmadm(8)

Name

fmadm - fault management configuration tool

Synopsis

fmadm [-q] [subcommand [arguments]]

Description

The fmadm utility can be used by administrators and service personnel to view and modify system configuration parameters maintained by the Solaris Fault Manager, fmd(8). fmd receives symptomatic telemetry associated with conditions detected by the system software, diagnoses the telemetry into faults, defects, or alerts, and initiates proactive self-healing activities such as disabling faulty components.

fmadm can be used to do the following:

  • View the set of diagnosis engines and agents that are currently participating in fault management.

  • View the list of system components that have been diagnosed as associated with a fault, defect, or alert.

  • Perform administrative tasks related to these entities.

The Fault Manager attempts to automate as many activities as possible, so use of fmadm is typically not required. When the Fault Manager needs help from an administrator, service repair technician, or Oracle, it produces a message indicating its needs. It also refers you to a knowledge article on the Oracle web site. The web site might ask you to use fmadm or one of the other fault management utilities to gather more information or perform additional tasks. The documentation for fmd(8), fmdump(8), and fmstat(8) man pages and the Securing Systems and Attached Devices in Oracle Solaris 11.4 guide describe more about tools to observe fault management activities.

One responsibility of the Fault Manager is to keep track of the location of components. At the chassis level, the fmadm *-alias subcommands manage a chassis chassis-name.chassis-serial to alias-id mapping. The administered alias-id is intended to describe the physical location of a chassis.

The fmadm utility requires the user to be assigned the solaris.fm.read RBAC authorization ("Fault Management" or "Fault Information" RBAC profile) for read operations, or the solaris.fm.modify RBAC authorization ("Fault Management" RBAC profile) for modify operations. The fmadm load subcommand requires that the user possess all privileges.

SUBCOMMANDS

The fmadm command accepts the following subcommands. Some of the subcommands accept or require additional options and operands. The acquit, load, unload, repaired, replaced, reset, and rotate subcommands are intended for trained technical personnel. Use of these subcommands without the specific guidance of, for example, a Knowledge Base article is not recommended.

fmadm acquit fmri | label [uuid]

Notify the Fault Manager that the specified resource is not to be considered to be a suspect in the event identified by uuid, or if no UUID is specified, then in any faults, defects, or alerts that have been detected. The fmadm acquit command should be used only at the direction of a documented Oracle repair procedure. Administrators might need to apply additional commands to re-enable a previously faulted resource.

fmadm acquit uuid

Notify the Fault Manager that the event identified by uuid can be safely ignored. The fmadm acquit command should be used only at the direction of a documented Oracle repair procedure. Administrators might need to apply additional commands to re-enable any previously faulted resources.

fmadm add-alias chassis-name.chassis-serial alias-id ['comment']

The add-alias subcommand is used to establish alias-id as a managed alias for the chassis-name.chassis-serial chassis. When a managed alias is defined, the /dev/chassis devchassis(4FS) name space representation of the chassis will use the more meaningful alias-id instead of the chassis-name.chassis-serial.

# fmadm add-alias SUN-Storage-J4410.1039QAQ007 RACK29.U25-28

The command shown above will verify that the new mapping does not conflict with existing mappings. In the case of conflict, no mapping change occurs. This command completes when the associated name space updates are complete. If the updated name space does not use the new alias-id, a warning is printed, but the mapping is updated. If the name space update takes too long, a warning is printed.

The add-alias will now also accept "SYS" as an identifier for the main chassis, instead of requiring the chassis-name.chassis-serial.

If an optional comment is provided, the comment is preserved and will be displayed by a subsequent lookup-alias or list-alias command. See also remove-alias and sync-alias.

fmadm clear label | uuid | class@resource

Notify the Fault Manager that any alert events associated with the specified location label or uuid or identified by class@resource should be cleared. This command can only be applied to an alert, not to a defect or fault.

fmadm config

Display the configuration of the Fault Manager itself, including the module name, version, and description of each component module. Fault Manager modules provide services such as automated diagnosis, self-healing, and messaging for hardware and software present on the system.

fmadm faulty [–afprsv] [–u uuid]

This command is an alias for the fmadm list command.

fmadm flush fmri | label

Flush the information cached by the Fault Manager for the specified resource, for any faults, defects, or alerts for which the resource has already been repaired, acquitted or replaced.

fmadm list [–afprsv] [–u uuid]

Display status information for resources that the Fault Manager currently believes to be associated with a fault, defect, or alert. See also list-alert, list-defect, and list-fault subcommands.

The following options are supported:

–a

Display all resources that might be associated with a fault, defect, or alert. By default, the fmadm list command only lists output for resources for which a fault, defect, or alert is still active. If you specify the –a option, all resource information cached by the Fault Manager is listed, including information for resources that might have already been acquitted or repaired, or might no longer be present in the system.

–f

Display FRUs (Field Replaceable Units) that are associated with a fault, defect, or alert.

–p

Pipe output through a pager with form feed between each event.

–r

Display affected Solaris resources with their Identifier (FMRI) and their fault management state.

–s

Display a one-line line summary for each event.

–u uuid

Only display the event with the given uuid.

–v

Display full output.

The percentage certainty is displayed if an event has multiple suspects, either of different classes or on different FRUs. If more than one resource is on the same FRU and it is not 100% certain that the event is associated with the FRU, the maximum percentage certainty of the possible suspects on the FRU is displayed.

The Fault Manager associates the following states with every FRU which that has been diagnosed as having a fault, defect, or alert.

faulty

The FRU has been diagnosed by the Fault Manager as being associated with a fault, defect, or alert, and is still present in the system.

faulty/not_present

The FRU has been diagnosed by the Fault Manager as being associated with a fault, defect, or alert and is no longer present in the system.

replaced

The FRU has been removed from the system and a replacement FRU has taken its place.

repaired

The command fmadm repaired has been used to notify the Fault Manager that the FRU has been repaired.

acquitted

The command fmadm acquit has been used to notify the Fault Manager that the FRU has been acquitted.

The state of any affected Solaris resources is also displayed. For a faulty FRU, the state of the associated resources can be one of:

  • Faulted and taken out of service

  • Faulted but still in service

  • Faulted but still providing degraded service

For a FRU that has been repaired, replaced or acquitted, the state of the associated resources can be one of:

  • Okay and in service

  • Out of service, but associated components no longer faulty

  • Service degraded, but associated components no longer faulty

fmadm list-alert [–afprsv] [–u uuid]

This subcommand behaves like list, but only displays suspect lists that contain an alert event.

fmadm list-alias

The list-alias command is used to display all comments and mappings. See also add-alias, lookup-alias, remove-alias, and sync-alias subcommands.

fmadm list-defect [–afprsv] [–u uuid]

This subcommand behaves like list, but only displays suspect lists that contain a defect event.

fmadm list-fault [–afprsv] [–u uuid]

This subcommand behaves like list, but only displays suspect lists that contain a fault event.

fmadm load path

Load the specified Fault Manager module. The path must be an absolute path and must refer to a module present in one of the defined directories for modules. Typically, the use of this command is not necessary because the Fault Manager loads modules automatically when Solaris initially boots or as needed. See also fmadm unload.

fmadm lookup-alias alias-id | chassis-name.chassis-serial

The lookup-alias subcommand can be used to determine what the current mapping is. The following is an example command.

# fmadm lookup-alias SUN-Storage-J4410.1039QAQ007

See also add-alias, list-alias, remove-alias, and sync-alias.

fmadm remove-alias alias-id | chassis-name.chassis-serial

The fmadm remove-alias subcommand is used to remove a chassis-name.chassis-serial to alias-id mapping.

# fmadm remove-alias RACK29.U25-28

The command shown above completes when the associated name space updates are complete. See also add-alias, list-alias, lookup-alias, and sync-alias.

fmadm repaired fmri | label

Notify the Fault Manager that a repair procedure has been carried out on the specified resource. The fmadm repaired command should be used only at the direction of a documented Oracle repair procedure. Administrators might need to apply additional commands to re-enable a previously faulted resource.

fmadm replaced fmri | label

Notify the Fault Manager that the specified resource has been replaced. This command should be used in those cases in which the Fault Manager is unable to automatically detect the replacement. The fmadm replaced command should be used only at the direction of a documented Oracle repair procedure. Administrators might need to apply additional commands to re-enable a previously faulted resource.

fmadm reset [–s serd] module

Reset the specified Fault Manager module or module subcomponent. If the –s option is present, the specified Soft Error Rate Discrimination (SERD) engine is reset within the module. If the –s option is not present, the entire module is reset and all persistent state associated with the module is deleted. The fmadm reset command should only be used at the direction of a documented Oracle repair procedure. The use of this command is typically not necessary because the Fault Manager manages its modules automatically.

fmadm rotate errlog | fltlog | infolog | infolog_hival

The fmadm rotate command is a helper command for the logadm(8) command, so that logadm can rotate live log files correctly. The fmadm rotate command is not intended to be invoked directly (and invoking it directly is likely to lose log history). Use one of the following commands to cause the appropriate logfile to be rotated, if the current logfile is not zero in size:

# logadm -p now -s 1b /var/fm/fmd/errlog
# logadm -p now -s 1b /var/fm/fmd/fltlog
# logadm -p now -s 1b /var/fm/fmd/infolog
# logadm -p now -s 1b /var/fm/fmd/infolog_hival
fmadm sync-alias

The sync-alias subcommand is used to hand-import a set of mappings in bulk. Two copies of the current mappings are maintained:

  • /etc/dev/chassis_aliases

  • /etc/dev/.chassis_aliases

To import a set of mappings in bulk, you can update the /etc/dev/chassis_aliases file and then run fmadm sync-alias. See also add-alias, list-alias, lookup-alias, and remove-alias.

fmadm unload module

Unload the specified Fault Manager module. Specify module using the basename listed in the fmadm config output. Typically, the use of this command is not necessary because the Fault Manager loads and unloads modules automatically based on the system configuration. See also fmadm load.

Options

The following options are supported:

–q

Set quiet mode. fmadm does not produce messages indicating the result of successful operations to standard output.

Operands

The following operands are supported:

subcommand

The name of a subcommand listed in SUBCOMMANDS.

arguments

One or more options or arguments appropriate for the selected subcommand, as described in SUBCOMMANDS. Among these arguments are fmri, uuid, and label. These identify resources that are the objects of fmadm subcommands. Use fmadm list to obtain the fmri, uuid, and label for a targeted resource. In general, label is the most user-friendly of these operands. See the Examples section below.

Exit Status

The following exit values are returned:

0

Successful completion

1

An error occurred. Errors include a failure to communicate with fmd or insufficient RBAC authorization to perform the requested operation

2

Invalid command-line options were specified

Examples

Example 1 Invoking the list Subcommand

The following command invokes the list subcommand, which displays the uuid, label, and fmri for a component.

# fmadm list
------------ ------------------------------------  ------------ ---------
TIME         EVENT-ID                              MSG-ID       SEVERITY
------------ ------------------------------------  ------------ ---------
Sep 09 16:15 96609fae-113c-e48c-b1cf-ebf4b0902d72  DISK-8000-3E Critical
                                                                
Problem Status  : open
Diag Engine     : eft / 1.16 
System
   Manufacturer : Oracle-Corp.
   Name         : SUN-FIRE-X4170-SERVER
   Part Number  : unknown
   Serial Number: 0920XF508B

----------------------------------------
Suspect 1 of 1:
  Fault class: fault.io.scsi.cmd.disk.dev.rqs.derr
  Certainty  : 100% 
  Affects    : dev:///:devid=id1,sd@n5000c5000940edbb//scsi_vhci/disk@g\
                 5000c5000940edbb
  Status     : faulted and taken out of service
  
   FRU  
      Status            : faulty
      Location         : "/SUN-Storage-J4410.1037QAQ052/HDD11"
      Location Alias   : "/RACK29.U25-28/HDD11"
      Manufacturer      : SEAGATE
      Name              : ST330057SSUN300G
      Part Number       : SEAGATE-ST330057SSUN300G
      Revision          : 0205 
      Serial Number     : 000930G01CN4----3SJ01CN4
      Chassis
         Manufacturer   : Oracle-Corp.
         Name           : SUN-Storage-J4410
         Part Number    : 594-5329
         Serial Number  : 1037QAQ052
      ...
      ...

In the preceding output, the uuid is the first item in the EVENT-ID column, 96609fae-113c-e48c-b1cf-ebf4b0902d72. The label is in the FRU section in the Location line, "/SUN-Storage-J4410.1037QAQ052/HDD11". In this example, an alias for the chassis has been set, and the aliased location is displayed in the Location Alias line, "/RACK29.U25-28/HDD11".

The fmris are available with fmdump –v:

# fmdump -v
Sep 09 16:15:36.9252 96609fae-113c-e48c-b1cf-ebf4b0902d72 DISK-8000-3E \
Diagnosed 100%  fault.io.scsi.cmd.disk.dev.rqs.derr

Problem in: hc://:scheme=:chassis-mfg=Oracle-Corp.:chassis-name=SUN-\ 
Storage-J4410:chassis-part=594-5329:chassis-serial=1037QAQ052/ses-\ 
enclosure=0/bay=11/disk=0

Affects: dev:///:devid=id1,sd@n5000c5000940edbb//\ 
scsi_vhci/disk@g5000c5000940edbb
FRU: hc://chassis-mfg=Oracle-Corp.:chassis-name=SUN-Storage-J4410\ 
:chassis-part=594-5329:chassis-serial=1037QAQ052:fru-mfg=SEAGATE\ 
:fru-name=SEAGATE-ST330057SSUN300G:fru-part=SEAGATE-ST330057SSUN300G\ 
:fru-revision=0205:fru-serial=000930G01CN4--------3SJ01CN4/\
ses-enclosure=0/bay=11/disk=0

FRU Location: "/SUN-Storage-J4410.1037QAQ052/HDD11"

Note that label is the easiest-to-use identifier. Either the aliased or the non-aliased form of the Location may be used.

Example 2 Obtaining the Module Name

The following command displays the module name for each component. The module name is specified as input to the fmadm unload command.

# fmadm config
MODULE                   VERSION STATUS  DESCRIPTION
cpumem-retire            1.1     active  CPU/Memory Retire Agent
disk-transport           1.0     active  Disk Transport Agent
eft                      1.16    active  eft diagnosis engine
..

Attributes

See attributes(7) for descriptions of the following attributes:

ATTRIBUTE TYPE
ATTRIBUTE VALUE
Availability
system/fault-management
Interface Stability
See below

The command-line options are Committed. The human-readable output is not-an-interface.

See Also

fmd(8), fmdump(8), fmstat(8), logadm(8), syslogd(8), attributes(7), devchassis(4FS)

Securing Systems and Attached Devices in Oracle Solaris 11.4

Notes

Oracle Solaris FMA does not determine or is not actively involved in SMART failure analysis or predictions. It reads the SMART data reported by the disk.

A third party utility such as SMARTCTL can be used to view more SMART information, which is read-only. The thresholds are determined by the disk manufacturer and can vary from one disk make/model/firmware level to other. HDD manufacturers do not make SMART threshold information available. However, SMARTCTL may be able to show you these values, but it is read-only.

Once a disk asserts a SMART failure prediction or warning, it must be replaced. You cannot turn off SMART failure prediction. Note that it is a matter of time before the disk fails completely.

While it is possible to tell Oracle Solaris FMA to ignore a specific SMART failure event by using the fmadm acquit command, it is not recommended. Again, once a disk asserts SMART failure, it cannot be changed and must be replaced.

It is recommended to ensure that your system software and firmware (including SAS controller and disk firmware) are all kept up to date. This ensures the system has the best capabilities.