C H A P T E R  3

Script Troubleshooting

This chapter lists and describes possible error messages, likely causes, and resolutions.

The s3kdlres script is designed to be safely rerun at any point. However, the script might not complete successfully if the process is interrupted or a timeout occurs. Manual intervention might be required to return the storage to a known state. Generally, that means performing a successful controller reset and confirming that the sccli can communicate with the device (restore IP address or in-band mapping). The previously saved XML file will be used for the rerun.

The script can bypass the firmware download and nvram reset by using the
--restore=all option. In addition, different types of settings can be restored as outlined previously with other --restore=<xxx> options. For example, if the script fails after the nvram reset, you can continue (possibly with manual intervention), with --restore=all.

If it is necessary to rerun the script from the beginning, the firmware is reloaded, the nvram is reset again, and all settings are restored from the saved XML file from the previous attempt. The major concern is the integrity of the XML file. On a clean first run, the script saves the XML file as specified on the command line. If the file exists, the script prompts whether it should be used to restore the configuration. The script will not overwrite the configuration file. For instance, if the script crashes immediately after the nvram reset, and the user restarts the script, the script will not generate a new XML file, and therefore will not overwrite the previously saved good configuration.



Note - Step-by-step interactions that take place when the upgrade script runs are logged to a file called s3kdlres.log, which can be found in the same directory where the script was run. If an upgrade fails, resulting in an indeterminate or incomplete status, contact authorized service personnel and make your log files (s3kdlres.log, <filename>.xml, and <filename>.txt)available to them. (See Creating a Configuration File for information about creating <filename>.txt.)




Error Messages

This section describes possible error messages.



Note - If special characters such as ampersands are used in a controller name, errors will result without an error message. If it is necessary to use special characters in a controller name, you must use standard UNIX syntax escape procedures to enclose the special character in single or double quotation marks.



Message

 

 

 

 

 

WARNING cli command failed continuing...

Command: <cmd>

The CLI response was:

<result>

Please check settings manually.

Occurs

Restoration of settings

Meaning

 

 

 

The sccli command failed to restore the setting. The script will continue. If necessary, the setting might be able to be restored after the script completes with sccli or the RS-232 interface. A summary of failed commands will be printed at the end of the script. Note that some settings might not be able to be recovered due to restrictions in the controller firmware.

Likely Cause

The setting is not supported with the current version of the firmware or Sun StorEdge CLI.

Resolution

 

Verify that the new firmware supports the settings, and then restore settings with sccli or the RS-232 interface after the script completes.

Message

 

 

 

 

Restoration completed with warnings.

Please check settings.

The following cli commands failed:

<sccli cmds>

Occurs

At script end

Meaning

The script failed to restore some settings. A summary of failed commands is printed.

Likely Cause

The setting is not supported with the current version of the firmware or Sun StorEdge CLI.

Resolution

 

Restore settings with sccli or the RS-232 interface. Note that some settings might not be able to be recovered due to restrictions in the controller firmware.

Message

A filename is required

Occurs

Startup check

Meaning

 

 

You must supply a file name as part of the s3kdlres command to save the XML data to that file (that is, in order to hold the XML configuration). Although the script does not require extensions on the file name, you can use one.

Likely Cause

Usage error

Resolution

Specify a file name on the command line. Restart the script.

Message

 

<sccli-cmd> not found or not executable

Enter full pathname for sccli to continue

Occurs

Startup check

Meaning

 

 

 

 

The script relies on the Sun StorEdge CLI being installed properly. This is normally installed as /opt/SUNWsscs/sbin/sccli for Solaris or UNIX operating systems. For Microsoft Windows, the default path name is the directory where the Sun StorEdge CLI package has been installed. If that fails, the script uses C:\Program Files\Sun\sccli. <sccli-cmd> is the full path name searched for by the script.

Likely Cause

Usage error

Resolution

 

 

 

Confirm that the package is installed correctly and that the location of sccli is correct.

Restart script with:

s3kdlres saved-XML-filename --cli=path-to-sccli --device=<device>

Message

 

Failed to contact RAID controller device: <device>

Please correct the problem, or enter a new device, and try again

Occurs

Startup check

Meaning

 

 

The script attempts a simple sccli command to confirm communication with the RAID controller. <device> is the in-band or out-of-band device specified on the command line or the default device: 192.168.1.1

Likely Cause

Usage error or device cannot be reached

Resolution

 

 

 

Confirm <device> is correct. Manually check communication with:

# sccli <device> about

Restart the script.

Message

 

Failed to contact RAID controller.

Please correct the problem and try again.

Occurs

After firmware download and nvram reset

Meaning

 

 

The script attempts a simple sccli command to confirm communication with the RAID controller. <device> is the in-band or out-of-band device specified on the command line or the default device: 192.168.1.1

Likely Cause

Failure to set network parameters for out-of-band or map LUN for in-band after nvram reset.

Resolution

 

 

 

 

 

 

 

 

 

Set the network parameters (IP address, netmask, and gateway) for out-of-band communication or map a LUN for in-band communication.

Confirm communication with the controller:

# sccli <device> about

Restart the script with the --restore=all option using the existing XML configuration file saved:

# s3kdlres <XMLfile> --device=<device> --restore=all

See --restore=settings|channels|maps|all for more information.

Message

 

Unknown Access Mode: <access-mode>

Wanted "inband" or "out-of-band"

Occurs

Startup check

Meaning

 

The access mode is used for certain conditionals. sccli cannot determine what access mode is being used or an unknown access mode was encountered.

Likely Cause

Bad XML data, a sccli bug, or a script bug

Resolution

 

 

 

 

 

Reset the controller using the serial port.

Confirm access mode with:
# sccli <device> show access

Restart the script.

If the problem persists, contact authorized Sun service personnel.

Message

 

Password is required.

Password is not correct.

Occurs

Startup check

Meaning

 

For out-of-band only. The controller password is set. Either the password was not supplied on the command line, or the password was incorrect.

Likely Cause

Usage error

Resolution

 

 

 

Restart the script with:

# s3kdlres saved-XML-filename IP-address --password=<controllerpassword>

Message

 

Unsupported firmware found.

Upgrades supported to firmware 4xxx only.

Occurs

Firmware download

Meaning

Only upgrades from 3.2x to 4.xx are supported.

Likely Cause

An incorrect firmware download was attempted.

Resolution

 

The firmware can be downloaded manually with sccli:

# sccli <device> download controller-firmware <fn>

Message

 

Upgrades to firmware <rev> not supported.

Upgrades supported to firmware 4xxx only.

Occurs

Firmware download

Meaning

Only upgrades from 3.2x to 4.xx are supported.

Likely Cause

An incorrect firmware download was attempted.

Resolution

 

The firmware can be downloaded manually with sccli:

# sccli <device> download controller-firmware <fn>

Message

 

Upgrades from firmware <rev> not supported.

Upgrades supported from filename 32xx only.

Occurs

Firmware download

Meaning

Only upgrades from 3.2x to 4.xx are supported.

Likely Cause

There is incorrect firmware loaded on the controller.

Resolution

 

The firmware can be downloaded manually with sccli:

# sccli <device> download controller-firmware <fn>

Message

 

 

 

 

Logical Volumes not supported in this release.

The following Logical Volumes were found:

Logical Volume id: <lv-id>

Note: This error message only applies to firmware version 4.11.

Occurs

Firmware download

Meaning

Logical volumes are not supported for this upgrade (firmware version 4.11).

Likely Cause

Logical volumes are present.

Resolution

 

 

Back up logical volume data to tape or other media. Delete the logical volumes, and restart the script to complete the upgrade. After the upgrade is completed, recreate the logical volumes manually and recover the data from backup.

Message

sccli version <version> not supported. Please update sccli

Occurs

Startup check

Meaning

sccli version 2.x or later is required.

Likely Cause

 

There has been an installation error or the incorrect sccli has been specified on the command line.

Resolution

 

 

 

 

Verify that earlier versions are removed, and if using the --cli=path-to-cli option, verify that the path is correct. Make sure sccli 2.x or later is installed.

Restart the script with:

s3kdlres saved-XML-filename --device=<device>

Message

 

 

 

 

 

 

 

<safte|ses> <rev> not supported

*** Out of rev. SES or SAF-TE code detected.

Please update SES or SAF-TE code with the following sccli command

Update out of rev. <safte|ses> code <rev>

sccli <device> download <safte|ses>-firmware <fn>

Where <fn> is name of the firmware file

Occurs

Startup check

Meaning

Minimum SAF-TE code 1168 and SES code 1046 (FC) or 0413 (SATA) is required.

Likely Cause

Out-of-date SAF-TE or SES code

Resolution

 

 

 

 

 

 

Confirm SAF-TE or SES code with:

# sccli <device> show safte

# sccli <device> show ses

The script lists a sample sccli command that you can use to download the SAF-TE or SES firmware. Upgrade SAF-TE or SES code using the command listed, and then restart the script.

Message

 

Can't open XML file: <XMLfile>: <error>

Failed to save XML file: <XMLfile>: <error>

Occurs

Startup check

Meaning

 

An error was encountered creating or saving the XML configuration. The <error> tag might provide additional information such as "permission denied."

Likely Cause

Insufficient privileges. File system full.

Resolution

 

 

Once you have verified permissions and available space, rerun the script.

Note: The XML file created is typically less than 100 Kbyte.

Message

 

 

 

 

The following devices have too many partitions for this upgrade.

Please delete partitions 32 and above:

--------------------------------------------

<logical-drive|volume>: Id: <id>, Partitions: <nPartitions>

Occurs

Startup check

Meaning

 

Firmware 4.xx supports up to 32 partitions per logical drive or logical volume. More than 32 partitions per logical drive or logical volume were found.

Likely Cause

More than 32 partitions detected.

Resolution

 

 

 

 

 

 

 

 

Back up or move all data on all partitions of the listed logical drives or volumes. Delete all partitions for logical drives or logical volumes with more than 32 partitions per logical drive or logical volume. Create a configuration that contains less than 32 partitions per logical drive or logical volume, and then restore the data.

Delete all partitions on the specified logical drives or volumes with:

# sccli <device> configure partition <partition> delete

Refer to the sccli man page for details on the configure partition command.

 

Caution: Deleting the partition destroys the data. Data cannot be recovered. Refer to the sccli man page for details on deleting partitions.

Restart the script.

 

Note: Deleting partitions can also be completed using controller firmware commands. Refer to the Sun StorEdge 3000Family RAID Firmware User's Guide for detailed instructions.

Message

 

 

 

 

 

Please change the Controller Assignment for the following:

<logical-<drive|volume>: Id: <id>, Change Assignment to <Primary|Secondary>

Then rerun the script with the option:

--restore=maps

Occurs

Restoring maps

Meaning

 

 

After an nvram reset, all logical drives and logical volumes are changed to Primary. Any logical drives that were originally assigned to the Secondary controller must be reassigned to the Secondary controller but sccli failed to do so.

Likely Cause

sccli failed to change the logical drive or logical volume assignment.

Resolution

 

 

 

 

 

 

 

 

 

 

 

 

 

Change the logical drive or logical volume assignment using the serial interface:

1. From the Main Menu, choose "view and edit Logical drives."

2. Select the logical drive that you want to reassign.

3. Choose "logical drive Assignments," and then choose Yes to confirm the reassignment. The reassignment is evident from the view and edit Logical drives screen. An LG number, such as P0 means that the logical drive is assigned to the primary controller. An LG number such as S0 means that the logical drive is assigned to the secondary controller.

4. Restart the script with the --restore=maps option using the existing XML configuration file saved:

# s3kdlres <XMLfile> --device=<device> --restore=maps

See --restore=settings|channels|maps|all for more information.

Message

 

 

 

 

 

Redundancy status after controller reset does not match original

Redundancy status.

Original Redundancy Status: <status>

New Redundancy Status: <status>

Correct problem and rerun script from the beginning.

Occurs

After firmware download, before nvram reset

Meaning

 

 

 

The controller redundancy status is read and saved before the firmware download happens. After the firmware download, the redundancy status is read again and compared to the original. There is a one-minute timeout for redundant controllers to bind as a pair after the firmware download and controller reset.

Likely Cause

 

Controllers did not bind as a pair after the firmware upgrade or the firmware upgrade was attempted on a unit in degraded mode (that is, the controller failed).

Resolution

 

 

 

 

 

 

 

 

Using the serial port, reset the controller, and restart the script using a new, unique file name to save the XML configuration data:

# s3kdlres new-saved-XML-file --device=<device>

A firmware download will be re-attempted even if the upgrade was successful.

If the problem persists, see 1, 2, or 3 below.

[1] If the New Redundancy Status shows "Detecting," check the redundancy status with:

# sccli <device> show redundancy

 

[2] If the redundancy status continues to show "Detecting," the controllers are not binding as a pair. Contact authorized Sun service personnel.

 

[3] If the redundancy status shows "Enabled," the controllers have now bound as a pair. Possibly the script timed out before the controllers bound.

Confirm the correct firmware revision with:

# sccli <device> show inquiry

 

If the firmware is correct, reset the nvram with:

# sccli <device> reset nvram

# sccli <device> reset controller

 

After the reset completes, set the IP address for out-of-band communication or map a LUN for in-band communication.

Confirm communication with the controller:

# sccli <device> about

 

Restart the script with the --restore=all option, using the existing XML configuration file saved:

# s3kdlres <XMLfile> --device=<device> --restore=all

See --restore=settings|channels|maps|all for more information.

 

If the firmware is incorrect, contact authorized Sun service personnel.

If the problem persists, contact authorized Sun service personnel.

The following errors are considered exceptions and are accompanied by a back trace, providing details on the failure. Back-trace details might be useful to authorized Sun service personnel. Report all exceptions to authorized Sun service personnel.

Message

 

Failed to run command: <cmd>: <error>

result: <result>

Occurs

Varies

Meaning

 

 

The script uses sccli to perform all operations on the controller. The command passed to sccli resulted in a non-zero exit status (failure), or the command timed out. The <error> tag may provide additional information. The <result> tag is the output result of sccli.

Likely Cause

 

Failure to contact device [see 1 below], unexpected XML tag [see 2 below], or command exceeded 20-minute timeout [see 3 below].

Resolution

 

 

 

 

 

 

 

 

 

 

[1] If the <result> tag indicates a failure to contact the device as reported by sccli, it might be the result of an in-band device going offline in the process. If possible, use an out-of-band connection. During the process, the in-band device will be "unmapped" from the RAID controller. This happens during the nvram reset and after "Restoring Channels Ids." If it happens after the nvram reset, it might be possible to recover by restarting the controller, mapping the LUN, and continuing with --restore=all.

Before running the --restore=all command, run the sccli command about to verify communication with the device.

# sccli <device> about

 

Restart the script with the --restore=all option, using the existing XML configuration file saved:

# s3kdlres <XMLfile> --device=<device> --restore=all

See --restore=settings|channels|maps|all for more information.

 

If it happens after "Restoring Channel Ids," it might be possible to recover with other --restore options:

# s3kdlres <XMLfile> --device=<device> --restore=settings

# s3kdlres <XMLfile> --device=<device> --restore=channels

 

At this point, a controller reset is required to cause the channel settings to apply. An in-band device might be offline again. Remap the LUN using the RS-232 serial interface, and restore the remaining LUN maps:

# s3kdlres <XMLfile> --device=<device> --restore=maps

See --restore=settings|channels|maps|all for more information.

 

[2] The script uses data supplied from the XML file to construct sccli commands. If the XML data is not valid or unexpected, an invalid sccli command may be constructed.

It may be possible to edit the XML file, and restart the script with the --restore=all option using an existing XML configuration file edited and saved:

# s3kdlres saved-XML-filename --device=<device> --restore=all

See --restore=settings|channels|maps|all for more information.

 

Caution: Only edit the XML file under the direction of authorized Sun service personnel. If this file becomes corrupted, you will not be able to use it to restore your configuration and all data could be lost.

 

[3] Twenty-minute and other timeouts are generally the result of hardware issues. In-band devices are discussed above.

For out-band-devices, there might be underlying network issues such as a stale arp cache.

Check network connectivity with:

# ping <out-of-band device>

# sccli <device> about

Report other conditions to authorized Sun service personnel.

Message

 

Cannot find Channel Assignment for ch: <ch>, tgt: <tgt>, found: <assignment>

Occurs

Restoring channels

Meaning

 

The script cannot determine the assignment (Primary or Secondary) for the channel and target listed from the XML data.

Likely Cause

Bad XML data

Resolution

 

 

 

 

 

 

 

 

 

 

 

 

Restore the channel settings manually with the sccli command:

# sccli <device> configure channel channel { host | drive } options

(Refer to the sccli man page for details on the configure channel command.)

Reset the controller, and continue to restore LUN maps with the existing saved XML file:

# sccli <device> reset controller

# s3kdlres <XMLfile> --device=<device> --restore=maps

See --restore=settings|channels|maps|all for more information.

Note: Resetting the controller and restoring the channel settings can also be completed using controller firmware commands. Refer to the Sun StorEdge 3000 Family RAID Firmware User's Guide for detailed instructions.

Message

Cannot find Channel Assignment for ld/lv: <id>, found: <assignment>

Occurs

Restoring maps

Meaning

 

The script cannot determine the assignment (Primary or Secondary) for the logical drive or logical volume listed.

Likely Cause

Bad XML data

Resolution

 

 

 

 

 

 

Restore the LUN mappings manually with the sccli command:

# sccli <device> map partition ch.tgt.lun [ wwpn ]

(Refer to the sccli man page for details on the map partition command.)

Note: Restoring the LUN mappings can also be completed using controller firmware commands. Refer to the Sun StorEdge 3000 Family RAID Firmware User's Guide for detailed instructions.

Message

No id found

Occurs

Startup check

Meaning

 

A logical drive or logical volume was found in the XML file, but some attribute information is corrupt or missing.

Likely Cause

Bad XML data

Resolution

 

Reset controller. Restart the script. If the problem persists, contact authorized Sun service personnel.

Message

Can't find product: <product> in product table

Occurs

Startup check

Meaning

The script confirms the product to be upgraded is supported.

Likely Cause

Unsupported product

Resolution

 

Reset the controller. Restart the script. If the problem persists, contact authorized Sun service personnel.