Go to main content
Oracle® ZFS Storage Appliance Customer Service Manual

Exit Print View

Updated: March 2017
 
 

Working with Firmware Updates

Following the application of a software update, any hardware for which the update includes newer versions of firmware is upgraded. Before the upgrade window, it is recommended to run a scrub as described in Scrubbing a Storage Pool (BUI) in Oracle ZFS Storage Appliance Administration Guide, Release OS8.7.0.

There are several types of devices for which firmware updates may be made available; each has distinct characteristics. Disks, storage enclosures, and certain internal SAS devices are upgraded in the background. When this is occurring, the firmware update progress is displayed in the left panel of the Maintenance > System BUI view, or in the maintenance system updates CLI context. These firmware updates are almost always hardware related, though it may briefly show some number of outstanding updates when applying certain deferred updates to components other than hardware.

For clustered controllers, the status only shows the updates that are pending on the local controller. For example, firmware updates displayed on the peer controller do not include firmware updates for the primary controller.

As of 2010Q3.4, when there are outstanding updates, an informational or warning icon appears next to the number of updates remaining. Clicking the icon brings up the Firmware Updates dialog, which lists the current remaining updates. For each update we also show the current version of the component, the time of the last attempted update, as well as the reason why the last attempt did not succeed.

We consider any outstanding updates to be in one of 3 states: Pending, In Progress and Failed. An update begins in the Pending state, and is periodically retried, at which time it moves into the In Progress state. If we fail to upgrade, due to a transient condition, the update is moved back to the Pending state, and otherwise to the Failed state.

In general, there is only an indication of a problem if:

  • There are updates in the Failed state.

  • Updates remain in the Pending state (or in limbo between the Pending and In Progress states) for an extended period of time (more than half an hour), without the number of remaining updates decreasing.

The following condition does not indicate a problem:

  • There are multiple chassis being upgraded, we are making progress (the number of remaining updates decreases), and some of the chassis transiently appear pending with a status indicating that some disk has only one path. This is also expected, since when we upgrade a chassis, we may reset one of its expanders. Resetting an expander causes some disks to temporarily have only one path, and as a result, updates to other chassis are held back until it is safe to do so again non-disruptively.

Note that currently the Firmware Updates dialog does not automatically refresh, so you would have to close it and re-open it to get an updated view.

Applying hardware updates is always done in a completely safe manner. This means that the system may be in a state where hardware updates cannot be applied. This is particularly important in the context of clustered configurations. During takeover and failback operations, any in-progress firmware update is completed; pending firmware updates are suspended until the takeover or failback has completed, at which time the restrictions described below are reevaluated in the context of the new cluster state and, if possible, firmware updates resume.


Caution

Caution  -  Unless absolutely necessary, takeover and failback operations should not be performed while firmware updates are in progress.


The rolling update procedure documented later meets all of these best practices and addresses the per-device-class restrictions described later. It should always be followed when performing updates in a clustered environment. In both clustered and standalone environments, these criteria are also reevaluated upon any reboot or diagnostic system software restart, which may cause previously suspended or incomplete firmware updates to resume.

  • Components internal to the storage controller (such as HBAs and network devices) other than disks and certain SAS devices are generally upgraded automatically during boot; these updates are not visible and will have completed by the time the management interfaces become available.

  • Upgrading disk or flash device firmware requires that the device be taken offline during the process. If there is insufficient redundancy in the containing storage pool to allow this operation, the firmware update will not complete and may appear "stalled". However, if the storage pools are in an exported state, the disks will update as expected. Disks and flash devices that are part of a storage pool which is currently in use by the cluster peer, if any, are not upgraded.


    Note -  When updating the firmware on a system with striped pools, make sure both controllers are running the same version before attempting a pool unconfigure.
  • Upgrading the firmware in a disk shelf requires that both back-end storage paths be active to all disks within all enclosures, and for storage to be configured on all shelves to be upgraded. For clusters with at least one active pool on each controller, these restrictions mean that disk shelf firmware update can be performed only by a controller that is in the "owner" state.

During the firmware update process, hardware may appear to be removed and inserted, or offlined and onlined. While alerts attributed to these actions are suppressed, if you are viewing the Maintenance > Hardware screen or the Configuration > Storage screen, you may see the effects of these updates in the UI in the form of missing or offline devices. This is not a cause for concern; however, if a device remains offline or missing for an extended period of time (several minutes or more) even after refreshing the hardware view, this may be an indication of a problem with the device. Check the Maintenance > Problems view for any relevant faults that may have been identified. Additionally, in some cases, the controllers in the disk shelves may remain offline during firmware update. If this occurs, no other controllers are upgraded until this condition is fixed. If an enclosure is listed as only having a single path for an extended period of time, check the physical enclosure to determine whether the green link lights on the back of the SIM or IOM are active. If not, remove and re-insert the SIM or IOM to re-establish the connection. Verify that all enclosures are reachable by two paths.