C H A P T E R 3 |
Script Troubleshooting |
This chapter lists and describes possible error messages, likely causes, and resolutions.
The s3kdlres script is designed to be safely rerun at any point. However, the script might not complete successfully if the process is interrupted or a timeout occurs. Manual intervention might be required to return the storage to a known state. Generally, that means performing a successful controller reset and confirming that the sccli can communicate with the device (restore IP address or in-band mapping). The previously saved XML file will be used for the rerun.
The script can bypass the firmware download and nvram reset by using the
--restore=all option. In addition, different types of settings can be restored as outlined previously with other --restore=<xxx> options. For example, if the script fails after the nvram reset, you can continue (possibly with manual intervention), with --restore=all.
If it is necessary to rerun the script from the beginning, the firmware is reloaded, the nvram is reset again, and all settings are restored from the saved XML file from the previous attempt. The major concern is the integrity of the XML file. On a clean first run, the script saves the XML file as specified on the command line. If the file exists, the script prompts whether it should be used to restore the configuration. The script will not overwrite the configuration file. For instance, if the script crashes immediately after the nvram reset, and the user restarts the script, the script will not generate a new XML file, and therefore will not overwrite the previously saved good configuration.
Note - Step-by-step interactions that take place when the upgrade script runs are logged to a file called s3kdlres.log, which can be found in the same directory where the script was run. If an upgrade fails, resulting in an indeterminate or incomplete status, contact authorized service personnel and make your log files (s3kdlres.log, <filename>.xml, and <filename>.txt)available to them. (See Creating a Configuration File for information about creating <filename>.txt.) |
This section describes possible error messages.
The sccli command failed to restore the setting. The script will continue. If necessary, the setting might be able to be restored after the script completes with sccli or the RS-232 interface. A summary of failed commands will be printed at the end of the script. Note that some settings might not be able to be recovered due to restrictions in the controller firmware. |
|
The setting is not supported with the current version of the firmware or Sun StorEdge CLI. |
|
Verify that the new firmware supports the settings, and then restore settings with sccli or the RS-232 interface after the script completes. |
|
Restoration completed with warnings. |
|
The script failed to restore some settings. A summary of failed commands is printed. |
|
The setting is not supported with the current version of the firmware or Sun StorEdge CLI. |
|
Restore settings with sccli or the RS-232 interface. Note that some settings might not be able to be recovered due to restrictions in the controller firmware. |
|
You must supply a file name as part of the s3kdlres command to save the XML data to that file (that is, in order to hold the XML configuration). Although the script does not require extensions on the file name, you can use one. |
|
Specify a file name on the command line. Restart the script. |
|
The script relies on the Sun StorEdge CLI being installed properly. This is normally installed as /opt/SUNWsscs/sbin/sccli for Solaris or UNIX operating systems. For Microsoft Windows, the default path name is the directory where the Sun StorEdge CLI package has been installed. If that fails, the script uses C:\Program Files\Sun\sccli. <sccli-cmd> is the full path name searched for by the script. |
|
Confirm that the package is installed correctly and that the location of sccli is correct. s3kdlres saved-XML-filename --cli=path-to-sccli --device=<device> |
|
Failed to contact RAID controller device: <device> Please correct the problem, or enter a new device, and try again |
|
The script attempts a simple sccli command to confirm communication with the RAID controller. <device> is the in-band or out-of-band device specified on the command line or the default device: 192.168.1.1 |
|
Confirm <device> is correct. Manually check communication with: |
|
The script attempts a simple sccli command to confirm communication with the RAID controller. <device> is the in-band or out-of-band device specified on the command line or the default device: 192.168.1.1 |
|
Failure to set network parameters for out-of-band or map LUN for in-band after nvram reset. |
|
Set the network parameters (IP address, netmask, and gateway) for out-of-band communication or map a LUN for in-band communication. Confirm communication with the controller: Restart the script with the --restore=all option using the existing XML configuration file saved: # s3kdlres <XMLfile> --device=<device> --restore=all See --restore=settings|channels|maps|all for more information. |
|
The access mode is used for certain conditionals. sccli cannot determine what access mode is being used or an unknown access mode was encountered. |
|
Reset the controller using the serial port. Confirm access mode with:
If the problem persists, contact authorized Sun service personnel. |
|
For out-of-band only. The controller password is set. Either the password was not supplied on the command line, or the password was incorrect. |
|
# s3kdlres saved-XML-filename IP-address --password=<controllerpassword> |
|
Logical Volumes not supported in this release. The following Logical Volumes were found: Note: This error message only applies to firmware version 4.11. |
|
Logical volumes are not supported for this upgrade (firmware version 4.11). |
|
Back up logical volume data to tape or other media. Delete the logical volumes, and restart the script to complete the upgrade. After the upgrade is completed, recreate the logical volumes manually and recover the data from backup. |
|
There has been an installation error or the incorrect sccli has been specified on the command line. |
|
Verify that earlier versions are removed, and if using the --cli=path-to-cli option, verify that the path is correct. Make sure sccli 2.x or later is installed. |
|
<safte|ses> <rev> not supported *** Out of rev. SES or SAF-TE code detected. Please update SES or SAF-TE code with the following sccli command Update out of rev. <safte|ses> code <rev> |
|
Minimum SAF-TE code 1168 and SES code 1046 (FC) or 0413 (SATA) is required. |
|
Confirm SAF-TE or SES code with: The script lists a sample sccli command that you can use to download the SAF-TE or SES firmware. Upgrade SAF-TE or SES code using the command listed, and then restart the script. |
|
An error was encountered creating or saving the XML configuration. The <error> tag might provide additional information such as "permission denied." |
|
Once you have verified permissions and available space, rerun the script. Note: The XML file created is typically less than 100 Kbyte. |
|
The following devices have too many partitions for this upgrade. Please delete partitions 32 and above: |
|
Firmware 4.xx supports up to 32 partitions per logical drive or logical volume. More than 32 partitions per logical drive or logical volume were found. |
|
Back up or move all data on all partitions of the listed logical drives or volumes. Delete all partitions for logical drives or logical volumes with more than 32 partitions per logical drive or logical volume. Create a configuration that contains less than 32 partitions per logical drive or logical volume, and then restore the data. Delete all partitions on the specified logical drives or volumes with: # sccli <device> configure partition <partition> delete Refer to the sccli man page for details on the configure partition command. |
|
Caution: Deleting the partition destroys the data. Data cannot be recovered. Refer to the sccli man page for details on deleting partitions. |
|
Note: Deleting partitions can also be completed using controller firmware commands. Refer to the Sun StorEdge 3000Family RAID Firmware User's Guide for detailed instructions. |
|
Please change the Controller Assignment for the following: <logical-<drive|volume>: Id: <id>, Change Assignment to <Primary|Secondary> |
|
After an nvram reset, all logical drives and logical volumes are changed to Primary. Any logical drives that were originally assigned to the Secondary controller must be reassigned to the Secondary controller but sccli failed to do so. |
|
sccli failed to change the logical drive or logical volume assignment. |
|
Change the logical drive or logical volume assignment using the serial interface: 1. From the Main Menu, choose "view and edit Logical drives." 2. Select the logical drive that you want to reassign. 3. Choose "logical drive Assignments," and then choose Yes to confirm the reassignment. The reassignment is evident from the view and edit Logical drives screen. An LG number, such as P0 means that the logical drive is assigned to the primary controller. An LG number such as S0 means that the logical drive is assigned to the secondary controller. 4. Restart the script with the --restore=maps option using the existing XML configuration file saved: # s3kdlres <XMLfile> --device=<device> --restore=maps See --restore=settings|channels|maps|all for more information. |
|
Redundancy status after controller reset does not match original Original Redundancy Status: <status> |
|
The controller redundancy status is read and saved before the firmware download happens. After the firmware download, the redundancy status is read again and compared to the original. There is a one-minute timeout for redundant controllers to bind as a pair after the firmware download and controller reset. |
|
Controllers did not bind as a pair after the firmware upgrade or the firmware upgrade was attempted on a unit in degraded mode (that is, the controller failed). |
|
Using the serial port, reset the controller, and restart the script using a new, unique file name to save the XML configuration data: # s3kdlres new-saved-XML-file --device=<device> A firmware download will be re-attempted even if the upgrade was successful. If the problem persists, see 1, 2, or 3 below. [1] If the New Redundancy Status shows "Detecting," check the redundancy status with: |
|
[2] If the redundancy status continues to show "Detecting," the controllers are not binding as a pair. Contact authorized Sun service personnel. |
|
[3] If the redundancy status shows "Enabled," the controllers have now bound as a pair. Possibly the script timed out before the controllers bound. |
|
After the reset completes, set the IP address for out-of-band communication or map a LUN for in-band communication. |
|
Restart the script with the --restore=all option, using the existing XML configuration file saved: # s3kdlres <XMLfile> --device=<device> --restore=all See --restore=settings|channels|maps|all for more information. |
|
If the firmware is incorrect, contact authorized Sun service personnel. If the problem persists, contact authorized Sun service personnel. |
|
The following errors are considered exceptions and are accompanied by a back trace, providing details on the failure. Back-trace details might be useful to authorized Sun service personnel. Report all exceptions to authorized Sun service personnel. |
|
The script uses sccli to perform all operations on the controller. The command passed to sccli resulted in a non-zero exit status (failure), or the command timed out. The <error> tag may provide additional information. The <result> tag is the output result of sccli. |
|
Failure to contact device [see 1 below], unexpected XML tag [see 2 below], or command exceeded 20-minute timeout [see 3 below]. |
|
[1] If the <result> tag indicates a failure to contact the device as reported by sccli, it might be the result of an in-band device going offline in the process. If possible, use an out-of-band connection. During the process, the in-band device will be "unmapped" from the RAID controller. This happens during the nvram reset and after "Restoring Channels Ids." If it happens after the nvram reset, it might be possible to recover by restarting the controller, mapping the LUN, and continuing with --restore=all. Before running the --restore=all command, run the sccli command about to verify communication with the device. |
|
Restart the script with the --restore=all option, using the existing XML configuration file saved: # s3kdlres <XMLfile> --device=<device> --restore=all See --restore=settings|channels|maps|all for more information. |
|
If it happens after "Restoring Channel Ids," it might be possible to recover with other --restore options: |
|
At this point, a controller reset is required to cause the channel settings to apply. An in-band device might be offline again. Remap the LUN using the RS-232 serial interface, and restore the remaining LUN maps: # s3kdlres <XMLfile> --device=<device> --restore=maps See --restore=settings|channels|maps|all for more information. |
|
[2] The script uses data supplied from the XML file to construct sccli commands. If the XML data is not valid or unexpected, an invalid sccli command may be constructed. It may be possible to edit the XML file, and restart the script with the --restore=all option using an existing XML configuration file edited and saved: # s3kdlres saved-XML-filename --device=<device> --restore=all See --restore=settings|channels|maps|all for more information. |
|
Caution: Only edit the XML file under the direction of authorized Sun service personnel. If this file becomes corrupted, you will not be able to use it to restore your configuration and all data could be lost. |
|
[3] Twenty-minute and other timeouts are generally the result of hardware issues. In-band devices are discussed above. For out-band-devices, there might be underlying network issues such as a stale arp cache. Check network connectivity with: Report other conditions to authorized Sun service personnel. |
|
Cannot find Channel Assignment for ch: <ch>, tgt: <tgt>, found: <assignment> |
|
The script cannot determine the assignment (Primary or Secondary) for the channel and target listed from the XML data. |
|
Restore the channel settings manually with the sccli command: # sccli <device> configure channel channel { host | drive } options (Refer to the sccli man page for details on the configure channel command.) Reset the controller, and continue to restore LUN maps with the existing saved XML file: # sccli <device> reset controller # s3kdlres <XMLfile> --device=<device> --restore=maps See --restore=settings|channels|maps|all for more information. Note: Resetting the controller and restoring the channel settings can also be completed using controller firmware commands. Refer to the Sun StorEdge 3000 Family RAID Firmware User's Guide for detailed instructions. |
|
Cannot find Channel Assignment for ld/lv: <id>, found: <assignment> |
|
The script cannot determine the assignment (Primary or Secondary) for the logical drive or logical volume listed. |
|
Restore the LUN mappings manually with the sccli command: # sccli <device> map partition ch.tgt.lun [ wwpn ] (Refer to the sccli man page for details on the map partition command.) Note: Restoring the LUN mappings can also be completed using controller firmware commands. Refer to the Sun StorEdge 3000 Family RAID Firmware User's Guide for detailed instructions. |
|
A logical drive or logical volume was found in the XML file, but some attribute information is corrupt or missing. |
|
Reset controller. Restart the script. If the problem persists, contact authorized Sun service personnel. |
|
The script confirms the product to be upgraded is supported. |
|
Reset the controller. Restart the script. If the problem persists, contact authorized Sun service personnel. |
Copyright © 2007, Dot Hill Systems Corporation. All rights reserved.