Go to main content

Managing Faults, Defects, and Alerts in Oracle® Solaris 11.3

Exit Print View

Updated: March 2018
 
 

Displaying Information About Alerts

An alert is information of interest that is neither a fault nor a defect. An alert might report a problem or might be simply informational. A problem that is reported by an alert is a misconfiguration or other problem that the administrator can resolve without assistance from a response agent. An example of this type of problem is a DIMM plugged into the wrong slot. An example of an informational message reported by an alert is a message that a shadow migration has completed. The following list provides examples of alert messages:

  • Threshold alerts – Temperature is high, storage is at capacity, a zpool is at 80% or 90% capacity, a quota is exceeded, the path count to a chassis or disk has changed. These kinds of alerts can predict a performance impact.

  • Configuration checks – An FRU has been added or removed, SAS cabling is incorrect, a DIMM is plugged into the wrong slot, a datalink changed, a link went up or down, ILOM is misconfigured, MTU (Maximum Transmission Unit - TCP/IP) is misconfigured.

  • Interesting events – A reboot occurred, file system events occurred, firmware has been upgraded, save core failed, ZFS deduplication failed, shadow migration completed.

Alerts can be in one of the following states:

  • active – The alert has not been cleared.

  • cleared – The alert has been cleared. The cleared state for alerts can be compared to the resolved state for faults and defects. See the following description of persistent and transient alerts for more information about clearing an alert.

Alerts can be persistent or transient.

  • A persistent alert is active until it is manually cleared as shown in fmadm clear Command.

  • A transient alert clears after a specified timeout period or is cleared by a service such as a network monitor.


Tip  -  Base your administrative action on output from the fmadm list-alert command. Log files output by the fmdump command contain a historical record of events and do not necessarily present active or open diagnoses. Log files output by fmdump -i are a historical record of telemetry and might not have been diagnosed into alerts.
Example 7  fmadm list-alert Output

Use the fmadm list-alert command to list all alerts that have not been cleared. The following alert shows that a disk has been removed from the system. The Problem Status has the value open, which is an active state. Problem Status can be open, isolated, repaired, or resolved. The Problem class indicates that the FRU has been removed. The Impact indicates that the severity of the impact depends on the importance of this device in your environment. Perhaps the most useful piece of information in this output is the MSG-ID. Follow the instructions in the Action at the end of the alert to access more information about FMD-8000-CV.

# fmadm list-alert
--------------- ------------------------------------  -------------- ---------
TIME            EVENT-ID                              MSG-ID         SEVERITY
--------------- ------------------------------------  -------------- ---------
Apr 23 02:15:12 a7921317-8ba2-4ab1-b1c3-b0fb8822c000  FMD-8000-CV    Minor

Problem Status    : open
Diag Engine       : software-diagnosis / 0.1
System
    Manufacturer  : Oracle Corporation
    Name          : Sun Netra X4270 M3
    Part_Number   : NILE-P1LRQT-8
    Serial_Number : 1211FM200D

System Component
    Manufacturer  : Oracle
    Name          : Sun Netra X4270 M3
    Part_Number   : NILE-P1LRQT-8
    Serial_Number : 1211FM200D
    Host_ID       : 008167b1

----------------------------------------
Suspect 1 of 1 :
   Problem class : alert.oracle.solaris.fmd.fru-monitor.fru-remove
   Certainty   : 100%

   FRU
     Status           : faulty/not present
     Location         : "/SUN-Storage-J4410.1051QCQ08A/HDD13"
     Manufacturer     : SEAGATE
     Name             : ST330057SSUN300G
     Part_Number      : SEAGATE-ST330057SSUN300G
     Revision         : 0B25
     Serial_Number    : 001117G1LC1S--------6SJ1LC1S
     Chassis
        Manufacturer  : SUN
        Name          : SUN-Storage-J4410
        Part_Number   : 3753659
        Serial_Number : 1051QCQ08A
   Resource
     Status           : faulty/not present

Description : FRU '/SUN-Storage-J4410.1051QCQ08A/HDD13' has been removed from
              the system.

Response    : FMD topology will be updated.

Impact      : System impact depends on the type of FRU.

Action      : Use 'fmadm faulty' to provide a more detailed view of this event.
              Please refer to the associated reference document at
              http://support.oracle.com/msg/FMD-8000-CV for the latest service
              procedures and policies regarding this diagnosis.