Working with Problems
To aid serviceability, Oracle ZFS Storage Appliance detects persistent hardware failures (faults) and software failures (defects, often included under faults) and reports them as active problems on the Maintenance: Problems page in the BUI, and in maintenance problems
in the CLI.
If the Phone Home service is enabled, active problems are automatically reported to Oracle Support, where a support case might be opened, depending on the service contract and the nature of the fault. Problem notification can be suspended while you are servicing Oracle ZFS Storage Appliance.
The following topics are described in this section:
Viewing Active Problems
The following table shows some example faults as they would be displayed in the Active Problems section of the Maintenance: Problems page in the BUI. For each problem, Oracle ZFS Storage Appliance reports what happened, when the problem was detected, the severity and type of the problem, and whether the problem has been phoned home. Severity can be Minor, Major, or Critical. Type can be Alert, Defect, Error, or Fault. Phoned Home is a date and time or Never. The table can be sorted by Date.
Table 1-9 Example BUI Problem Displays
Date | Description | Type | Phoned Home |
---|---|---|---|
2022-09-16 13:56:36 |
SMART health-monitoring firmware reported that a disk failure is imminent. |
Major Fault |
Never |
2022-09-05 17:42:55 |
A disk of a different type (cache, log, or data) was inserted into a slot. The newly inserted device must be of the same type. |
Minor Fault |
Never |
2022-08-21 16:40:37 |
The ZFS pool has experienced currently unrecoverable I/O failures. |
Major Error |
Never |
2022-07-16 22:03:22 |
A memory module is experiencing excessive correctable errors affecting large numbers of pages. |
Major Fault |
Never |
Clicking on a problem shows more information about the problem in the Problem Details section of the page, including the impact to the system, affected components, the system's automated response (if any), and the recommended action for the administrator (if any).
To view the affected hardware component for a hardware fault and to optionally turn on its locator LED on Oracle ZFS Storage Appliance, see Locating a Failed Component - BUI, CLI.
The CLI provides similar information, as shown in the following example:
hostname:maintenance problems> show Problems: COMPONENT DIAGNOSED TYPE DESCRIPTION problem-000 2022-4-3 20:30:12 Major Fault A sensor indicates that the power supply '1235FM401W/PSU 01' is not operating properly due to some external condition. problem-001 2022-4-3 17:53:58 Major Fault External sensors indicate that the power supply 'hostname/PSU 1' is no longer operating correctly.
For more information, select a problem. Only the uuid
,
diagnosed
, severity
, type
,
and description
fields are considered to be stable. Other property
values might change in a new release.
hostname:maintenance problems> select problem-000 hostname:maintenance problem-000> show Properties: uuid = uuid code = SENSOR-8000-7L diagnosed = 2022-4-3 20:30:12 phoned_home = never severity = Major type = Fault url = https://support.oracle.com/msg/SENSOR-8000-7L description = A sensor indicates that the power supply '1235FM401W/PSU 01' is not operating properly due to some external condition. impact = The enclosure may be getting inadequate power. Subsequent loss of power supplies may force the enclosure to shutdown. response = None. action = Check to see if the power cord is connected properly or if there are other conditions that may be causing inadequate power to be provided to the indicated power supply. Please refer to the associated reference document at https://support.oracle.com/msg/SENSOR-8000-7L for the latest service procedures and policies regarding this diagnosis. Components: component-000 100% 1235FM401W: PSU 01 (degraded) Manufacturer: Oracle Part number: part-number Serial number: serial-number hostname:maintenance problem-000> select component-000 hostname:maintenance problem-000 component-000> show Properties: certainty = 100 status = degraded chassis_label = 1235FM401W component_label = PSU 01 manufacturer = Oracle part = part-number serial = serial-number
Related Topics
-
Persistent logs of all faults, defects, errors, and alerts are available under Maintenance: Logs in the BUI, and under
maintenance logs
in the CLI. For more information, see Using Logs. -
Faults and defects are subcategories of alerts. Filter rules can be configured to cause Oracle ZFS Storage Appliance to email administrators or perform other actions when faults are detected. For more information about alerts, see Configuring Alerts in Oracle ZFS Storage Appliance Administration Guide, Release OS8.8.x.
Repairing Active Problems
Active problems can be a result of a hardware fault or software defect. To repair an active problem, perform the steps described in the suggested action section. For hardware faults, repair typically involves replacing a physical component. For software defects, repair typically involves reconfiguring and restarting the affected service.
After a problem is repaired, the problem no longer appears in the list of active problems.
While the system can detect repairs automatically, in some cases manual intervention is required. If a problem persists after completing the suggested action, contact Oracle support. You might be instructed to mark the problem as repaired. Manually marking a problem as repaired should only be done under the direction of Oracle service personnel or as part of a documented Oracle repair procedure.
Suspending and Resuming Problem Notification
Servicing the appliance can generate false failures. For example, replacing a disk generates FRU remove and Invalid Configuration events, which can generate SRs.
To avoid sending SRs when no problem exists, you can suspend all notifications during the period when you are performing the service.
Suspending Problem Notification
To suspend all notifications, do one of the following:
-
BUI – Check the Suspend Notifications box at the top of the Maintenance: Problems page.
-
CLI – Enable the
suspend_notification
property inmaintenance problems
.hostname:maintenance problems> ls Properties: suspend_notification = disabled period =
The
period
property is read-only. As in the BUI, it displays the remaining amount of time that notifications will be suspended.
To enable or disable notification suspension, the user must be assigned the maintenance authorization in the Appliance scope.
Notification suspension behaves in the following way:
-
All external notifications are suspended, including the following:
-
Phone Home
-
Emails
-
Any user-configured alert actions, as described in Configuring Alerts in Oracle ZFS Storage Appliance Administration Guide, Release OS8.8.x
-
-
If you suspend notifications for one node of a cluster, notifications are suspended for both cluster nodes.
-
While notifications are suspended, events continue to be logged and will be sent when event notification is resumed. See Resuming Problem Notification.
-
By default, notifications are suspended for 8 hours, or for a
period
of 480 minutes. -
While notifications are suspended, a persistent minor alert is displayed in the Active Problems section of the Maintenance: Problems BUI page, or in the
Problems
section ofmaintenance problems
:"The suspending of notifications has started."
Resuming Problem Notification
While notifications are suspended, events continue to be logged and will be sent when event notification is resumed.
Note:
Before you resume normal problem notification, clear any problem events that should not be sent to Oracle.Before you resume normal problem notification, the only accumulated events in the Problems BUI page or in the maintenance problems
CLI context should be problems that still need to be corrected and that need to be sent to Oracle for further action.
To end notification suspension and resume normal problem notification prior to the end of the default suspension period, do one of the following:
-
BUI – From the Maintenance menu, select Problems, and clear the Suspend Notifications box.
-
CLI – Disable the
suspend_notification
property inmaintenance problems
.
To enable or disable notification suspension, the user must be assigned the maintenance authorization in the Appliance scope.