Skip Navigation Links | |
Exit Print View | |
![]() |
Oracle® ZFS Storage Appliance Customer Service Manual For ZS3-x, 7x20 Controllers, and DE2-24, Sun Disk Shelves |
Chapter 2 Hardware Maintenance
Managing Support Bundles Using the BUI
Generating and Uploading a Support Bundle Using the BUI
Managing Support Bundles Using the CLI
Scheduling Software Notification Using the BUI
Scheduling Software Notification Using the CLI
Checking for Updates Using the BUI
Checking for Updates Using the CLI
Troubleshooting Update Health Check Failures
Actions to Take to Resolve Health Check Alerts
Steps for Resolving Health Check Alerts
Performing the Cluster Upgrade
Applying Deferred Updates (CLI)
Triple-Parity RAID Deferred Update
Data Deduplication Deferred Update
Received Properties Deferred Update
Snapshot Deletion Deferred Update
Recursive Snapshots Deferred Update
Multiple Initiator Groups per LUN
Managing Configuration Backups Using the BUI
Restore from a Saved Configuration
Managing Configuration Backups Using the CLI
Restore from a Saved Configuration
Alert Action Execution Context
Following the application of a software upgrade, any hardware for which the upgrade includes newer versions of firmware is upgraded. There are several types of devices for which firmware upgrades may be made available; each has distinct characteristics.
Disks, storage enclosures, and certain internal SAS devices are upgraded in the background. When this is occurring, the firmware upgrade progress is displayed in the left panel of the Maintenance > System BUI view, or in the maintenance system updates CLI context. These firmware updates are almost always hardware related, though it may briefly show some number of outstanding updates when applying certain deferred updates to components other than hardware.
As of 2010Q3.4, when there are outstanding updates, an informational or warning icon appears next to the number of updates remaining. Clicking the icon brings up the Firmware Updates dialog, which lists the current remaining updates. For each update we also show the current version of the component, the time of the last attempted update, as well as the reason why the last attempt did not succeed.
We consider any outstanding updates to be in one of 3 states: Pending, In Progress and Failed. An update begins in the Pending state, and is periodically retried, at which time it moves into the In Progress state. If we fail to upgrade, due to a transient condition, the upgrade is moved back to the Pending state, and otherwise to the Failed state.
In general, it is only an indication of a problem if:
There are updates in the Failed state.
Updates remain in the Pending state (or in limbo between the Pending and In Progress states) for an extended period of time (more than half an hour), without the number of remaining updates decreasing.
The following conditions do not indicate a problem:
Disks firmware updates are shown as pending for extended periods of time, with a status message indicating that they are not part of any pool. This is expected, given that we only update disk firmware, for disks that are part of a pool. In order to update these disks, add them to a pool.
There are multiple chassis being updated, we are making progress (the number of remaining updates decreases), and some of the chassis transiently appear pending with a status indicating that some disk has only one path. This is also expected, since when we update a chassis, we may reset one of its expanders. Resetting an expander causes some disks to temporarily have only one path, and as a result, upgrades to other chassis are held back until it is safe to do so again non-disruptively.
Note that currently the Firmware Updates dialog does not automatically refresh, so you would have to close it and re-open it to get an updated view.
Applying hardware updates is always done in a completely safe manner. This means that the system may be in a state where hardware updates cannot be applied. This is particularly important in the context of clustered configurations. During takeover and failback operations, any in-progress firmware upgrade is completed; pending firmware upgrades are suspended until the takeover or failback has completed, at which time the restrictions described below are reevaluated in the context of the new cluster state and, if possible, firmware upgrades resume.
![]() | Caution - Unless absolutely necessary, takeover and failback operations should not be performed while firmware upgrades are in progress. |
The rolling upgrade procedure documented later meets all of these best practices and addresses the per-device-class restrictions described later. It should always be followed when performing upgrades in a clustered environment. In both clustered and standalone environments, these criteria are also reevaluated upon any reboot or diagnostic system software restart, which may cause previously suspended or incomplete firmware upgrades to resume.
Components internal to the storage controller (such as HBAs and network devices) other than disks and certain SAS devices are generally upgraded automatically during boot; these upgrades are not visible and will have completed by the time the management interfaces become available.
Upgrading disk or flash device firmware requires that the device be taken offline during the process. If there is insufficient redundancy in the containing storage pool to allow this operation, the firmware upgrade will not complete and may appear "stalled." Disks and flash devices that are part of a storage pool which is currently in use by the cluster peer, if any, are not upgraded. Finally, disks and flash devices that are not part of any storage pool are not upgraded.
Upgrading the firmware in a disk shelf requires that both back-end storage paths be active to all disks within all enclosures, and for storage to be configured on all shelves to be upgraded. For clusters with at least one active pool on each controller, these restrictions mean that disk shelf firmware upgrade can be performed only by a controller that is in the "owner" state.
During the firmware upgrade process, hardware may appear to be removed and inserted, or offlined and onlined. While alerts attributed to these actions are suppressed, if you are viewing the Maintenance > Hardware screen or the Configuration > Storage screen, you may see the effects of these upgrades in the UI in the form of missing or offline devices. This is not a cause for concern; however, if a device remains offline or missing for an extended period of time (several minutes or more) even after refreshing the hardware view, this may be an indication of a problem with the device. Check the Maintenance > Problems view for any relevant faults that may have been identified. Additionally, in some cases, the controllers in the disk shelves may remain offline during firmware upgrade. If this occurs, no other controllers are updated until this condition is fixed. If an enclosure is listed as only having a single path for an extended period of time, check the physical enclosure to determine whether the green link lights on the back of the SIM are active. If not, remove and re-insert the SIM to re-establish the connection. Verify that all enclosures are reachable by two paths.