An alert is information of interest that is neither a fault nor a defect. An alert might report a problem or might be simply informational. A problem that is reported by an alert is a misconfiguration or other problem that the administrator can resolve without assistance from a response agent. An example of this type of problem is a DIMM plugged into the wrong slot. An example of an informational message reported by an alert is a message that a shadow migration has completed. The following list provides examples of alert messages:
Threshold alerts – Temperature is high, storage is at capacity, a zpool is at 80% or 90% capacity, a quota is exceeded, the path count to a chassis or disk has changed. These kinds of alerts can predict a performance impact.
Configuration checks – An FRU has been added or removed, SAS cabling is incorrect, a DIMM is plugged into the wrong slot, a datalink changed, a link went up or down, ILOM is misconfigured, MTU (Maximum Transmission Unit - TCP/IP) is misconfigured.
Interesting events – A reboot occurred, file system events occurred, firmware has been upgraded, save core failed, ZFS deduplication failed, shadow migration completed.
If an application that is signed by Oracle terminates abnormally, a diagnostic core is saved and an alert is generated. See COREDIAG Alerts.
Alerts can be in one of the following states:
active – The alert has not been cleared.
cleared – The alert has been cleared. The cleared state for alerts can be compared to the resolved state for faults and defects. See the following description of persistent and transient alerts for more information about clearing an alert.
Alerts can be persistent or transient.
A persistent alert is active until it is manually cleared as shown in fmadm clear Command.
A transient alert clears after a specified timeout period or is cleared by a service such as a network monitor.
Use the fmadm list-alert command to list all alerts that have not been cleared. The following alert shows that a disk has been removed from the system. The Problem Status has the value open, which is an active state. Problem Status can be open, isolated, repaired, or resolved. The Problem class indicates that the FRU has been removed. The Impact indicates that the severity of the impact depends on the importance of this device in your environment. Perhaps the most useful piece of information in this output is the MSG-ID. Follow the instructions in the Action at the end of the alert to access more information about FMD-8000-CV.
# fmadm list-alert
--------------- ------------------------------------ -------------- ---------
TIME EVENT-ID MSG-ID SEVERITY
--------------- ------------------------------------ -------------- ---------
Apr 23 02:15:12 a7921317-8ba2-4ab1-b1c3-b0fb8822c000 FMD-8000-CV Minor
Problem Status : open
Diag Engine : software-diagnosis / 0.1
System
Manufacturer : Oracle Corporation
Name : Sun Netra X4270 M3
Part_Number : NILE-P1LRQT-8
Serial_Number : 1211FM200D
System Component
Manufacturer : Oracle
Name : Sun Netra X4270 M3
Part_Number : NILE-P1LRQT-8
Serial_Number : 1211FM200D
Host_ID : 008167b1
----------------------------------------
Suspect 1 of 1 :
Problem class : alert.oracle.solaris.fmd.fru-monitor.fru-remove
Certainty : 100%
FRU
Status : faulty/not present
Location : "/SUN-Storage-J4410.1051QCQ08A/HDD13"
Manufacturer : SEAGATE
Name : ST330057SSUN300G
Part_Number : SEAGATE-ST330057SSUN300G
Revision : 0B25
Serial_Number : 001117G1LC1S--------6SJ1LC1S
Chassis
Manufacturer : SUN
Name : SUN-Storage-J4410
Part_Number : 3753659
Serial_Number : 1051QCQ08A
Resource
Status : faulty/not present
Description : FRU '/SUN-Storage-J4410.1051QCQ08A/HDD13' has been removed from
the system.
Response : FMD topology will be updated.
Impact : System impact depends on the type of FRU.
Action : Use 'fmadm faulty' to provide a more detailed view of this event.
Please refer to the associated reference document at
http://support.oracle.com/msg/FMD-8000-CV for the latest service
procedures and policies regarding this diagnosis.
If an application that is signed by Oracle terminates abnormally, a diagnostic core is saved and an alert is generated. See Configuring Reporting of Diagnostic Core Dumps for options to change this default reporting behavior.
A diagnostic core is smaller than a global core because only the relevant information about the particular application is saved, such as the stack and environment variables. A diagnostic core has two parts: a core file (core.diag) and a core summary file (core.json). These two files are placed in /var/share/diag/uuid, where uuid is the process ID of the application that failed. The /var/share/diag directory is linked to from /var/diag.
The core files are purged periodically by coremond so that only the summary files remain. You can use options of the coreadm command or properties of the coreadm:default service to modify the policy for retaining the files, specify a different location for the files, and modify other configuration.
The /var/share/diag/path-to-binary directory contains links to /var/share/diag/uuid directories for that binary, which makes it easier to associate core files with applications. For example, if /usr/bin/vim terminated abnormally three times, the directory /var/share/diag/usr/bin/vim would contain links to /var/share/diag/uuid-1, /var/share/diag/uuid-2, and /var/share/diag/uuid-3.
The following example is a core diagnostic alert for VirtualBox:
--------------- ------------------------------------ -------------- ---------
TIME EVENT-ID MSG-ID SEVERITY
--------------- ------------------------------------ -------------- ---------
Nov 04 21:06:16 1c9c8afa-036d-4eb3-a97f-a17298b20fa9 COREDIAG-8000-1V Major
Problem Status : open
Diag Engine : software-diagnosis / 0.2
System
Manufacturer : unknown
Name : unknown
Part_Number : unknown
Serial_Number : unknown
System Component
Manufacturer : innotek GmbH
Name : VirtualBox
Part_Number :
Serial_Number : 0
Firmware_Manufacturer : innotek GmbH
Firmware_Version : (BIOS)VirtualBox
Firmware_Release : (BIOS)12.01.2006
Host_ID : 008953e5
----------------------------------------
Suspect 1 of 1 :
Problem class : alert.oracle.solaris.utility.corediag.dump_available
Certainty : 100%
Resource
FMRI : "sw:///:path=/usr/lib/picl/picld#:token=0fed5e879996dfc053f62f6736a01cb432f0b7d92f653beef1b587a5e0019483"
Status : Active
Description : A diagnostic core file was dumped in
/var/diag/1de0f8bc-d4f6-416e-843c-efba9f9edb65 for RESOURCE
/usr/lib/picl/picld whose ASRU is svc:/system/picl:default. The
ASRU is the Service FMRI for the resource and will be NULL if the
resource is not part of a service. The following are potential
bugs.
stack[1] - 15760557 22191243 22551744
Response : The diagnostic core file will be removed and a json format core
data summary file will be generated in
/var/diag/1de0f8bc-d4f6-416e-843c-efba9f9edb65.
Impact : The program may not be working properly.
Action : Use 'fmadm faulty' to provide a more detailed view of this event.
Please refer to the associated reference document at
http://support.oracle.com/msg/COREDIAG-8000-1V for the latest
service procedures and policies regarding this diagnosis.