An alert is information of interest that is neither a fault nor a defect. An alert might report a problem or might be simply informational. A problem that is reported by an alert is a misconfiguration or other problem that the administrator can resolve without assistance from a response agent. An example of this type of problem is a DIMM plugged into the wrong slot. An example of an informational message reported by an alert is a message that a shadow migration has completed. The following list provides examples of alert messages:
Threshold alerts – Temperature is high, storage is at capacity, a zpool is at 80% or 90% capacity, a quota is exceeded, the path count to a chassis or disk has changed. These kinds of alerts can predict a performance impact.
Configuration checks – An FRU has been added or removed, SAS cabling is incorrect, a DIMM is plugged into the wrong slot, a datalink changed, a link went up or down, ILOM is misconfigured, MTU (Maximum Transmission Unit - TCP/IP) is misconfigured.
Interesting events – A reboot occurred, file system events occurred, firmware has been upgraded, save core failed, ZFS deduplication failed, shadow migration completed.
If an application that is signed by Oracle terminates abnormally, a diagnostic core is saved and an alert is generated. See COREDIAG Alerts.
Alerts can be in one of the following states:
active – The alert has not been cleared.
cleared – The alert has been cleared. The cleared state for alerts can be compared to the resolved state for faults and defects. See the following description of persistent and transient alerts for more information about clearing an alert.
Alerts can be persistent or transient.
A persistent alert is active until it is manually cleared as shown in fmadm clear Command.
A transient alert clears after a specified timeout period or is cleared by a service such as a network monitor.
Use the fmadm list-alert command to list all alerts that have not been cleared. The following alert shows that a disk has been removed from the system. The Problem Status has the value open, which is an active state. Problem Status can be open, isolated, repaired, or resolved. The Problem class indicates that the FRU has been removed. The Impact indicates that the severity of the impact depends on the importance of this device in your environment. Perhaps the most useful piece of information in this output is the MSG-ID. Follow the instructions in the Action at the end of the alert to access more information about FMD-8000-CV.
# fmadm list-alert --------------- ------------------------------------ -------------- --------- TIME EVENT-ID MSG-ID SEVERITY --------------- ------------------------------------ -------------- --------- Apr 23 02:15:12 a7921317-8ba2-4ab1-b1c3-b0fb8822c000 FMD-8000-CV Minor Problem Status : open Diag Engine : software-diagnosis / 0.1 System Manufacturer : Oracle Corporation Name : Sun Netra X4270 M3 Part_Number : NILE-P1LRQT-8 Serial_Number : 1211FM200D System Component Manufacturer : Oracle Name : Sun Netra X4270 M3 Part_Number : NILE-P1LRQT-8 Serial_Number : 1211FM200D Host_ID : 008167b1 ---------------------------------------- Suspect 1 of 1 : Problem class : alert.oracle.solaris.fmd.fru-monitor.fru-remove Certainty : 100% FRU Status : faulty/not present Location : "/SUN-Storage-J4410.1051QCQ08A/HDD13" Manufacturer : SEAGATE Name : ST330057SSUN300G Part_Number : SEAGATE-ST330057SSUN300G Revision : 0B25 Serial_Number : 001117G1LC1S--------6SJ1LC1S Chassis Manufacturer : SUN Name : SUN-Storage-J4410 Part_Number : 3753659 Serial_Number : 1051QCQ08A Resource Status : faulty/not present Description : FRU '/SUN-Storage-J4410.1051QCQ08A/HDD13' has been removed from the system. Response : FMD topology will be updated. Impact : System impact depends on the type of FRU. Action : Use 'fmadm faulty' to provide a more detailed view of this event. Please refer to the associated reference document at http://support.oracle.com/msg/FMD-8000-CV for the latest service procedures and policies regarding this diagnosis.
If an application that is signed by Oracle terminates abnormally, a diagnostic core is saved and an alert is generated. See Configuring Reporting of Diagnostic Core Dumps for options to change this default reporting behavior.
A diagnostic core is smaller than a global core because only the relevant information about the particular application is saved, such as the stack and environment variables. A diagnostic core has two parts: a core file (core.diag) and a core summary file (core.json). These two files are placed in /var/share/diag/uuid, where uuid is the process ID of the application that failed. The /var/share/diag directory is linked to from /var/diag.
The core files are purged periodically by coremond so that only the summary files remain. You can use options of the coreadm command or properties of the coreadm:default service to modify the policy for retaining the files, specify a different location for the files, and modify other configuration.
The /var/share/diag/path-to-binary directory contains links to /var/share/diag/uuid directories for that binary, which makes it easier to associate core files with applications. For example, if /usr/bin/vim terminated abnormally three times, the directory /var/share/diag/usr/bin/vim would contain links to /var/share/diag/uuid-1, /var/share/diag/uuid-2, and /var/share/diag/uuid-3.
The following example is a core diagnostic alert for VirtualBox:
--------------- ------------------------------------ -------------- --------- TIME EVENT-ID MSG-ID SEVERITY --------------- ------------------------------------ -------------- --------- Nov 04 21:06:16 1c9c8afa-036d-4eb3-a97f-a17298b20fa9 COREDIAG-8000-1V Major Problem Status : open Diag Engine : software-diagnosis / 0.2 System Manufacturer : unknown Name : unknown Part_Number : unknown Serial_Number : unknown System Component Manufacturer : innotek GmbH Name : VirtualBox Part_Number : Serial_Number : 0 Firmware_Manufacturer : innotek GmbH Firmware_Version : (BIOS)VirtualBox Firmware_Release : (BIOS)12.01.2006 Host_ID : 008953e5 ---------------------------------------- Suspect 1 of 1 : Problem class : alert.oracle.solaris.utility.corediag.dump_available Certainty : 100% Resource FMRI : "sw:///:path=/usr/lib/picl/picld#:token=0fed5e879996dfc053f62f6736a01cb432f0b7d92f653beef1b587a5e0019483" Status : Active Description : A diagnostic core file was dumped in /var/diag/1de0f8bc-d4f6-416e-843c-efba9f9edb65 for RESOURCE /usr/lib/picl/picld whose ASRU is svc:/system/picl:default. The ASRU is the Service FMRI for the resource and will be NULL if the resource is not part of a service. The following are potential bugs. stack[1] - 15760557 22191243 22551744 Response : The diagnostic core file will be removed and a json format core data summary file will be generated in /var/diag/1de0f8bc-d4f6-416e-843c-efba9f9edb65. Impact : The program may not be working properly. Action : Use 'fmadm faulty' to provide a more detailed view of this event. Please refer to the associated reference document at http://support.oracle.com/msg/COREDIAG-8000-1V for the latest service procedures and policies regarding this diagnosis.