C H A P T E R  13

Troubleshooting

This chapter provides troubleshooting information about the software and the storage space. The chapter contains the following sections:


Troubleshooting Potential Software Issues

If you experience problems installing or using the Sun StorageTek RAID Manager software, follow these suggestions:


Identifying a Failed or Failing Component

When a Warning- or Error-level event occurs, use the rapid fault isolation feature of the Sun StorageTek RAID Manager software to quickly identify the source of the problem.

For instance, in this example, a disk drive has failed. To find the failed disk drive, follow the yellow Error icons.

FIGURE 13-1 Using Icons to Identify Failures


Figure shows several screen shots with callouts to highlight the locations of the yellow error icons.

The GUI Displays Logical Drives as Failed When a Blade or JBOD is Powered Off

If a disk subsystem, such as a blade or JBOD, is powered off separately from a host, the operating system (OS) continues to detect the logical drives of that powered-off disk subsystem. This is because the logical drives already existed prior to powering off the blade or JBOD. In this situation, the OS expects the logical drives could return to their operating status at any time. Therefore, the Sun StorageTek RAID Manager GUI displays the logical drives as failed, assuming no physical drives are present.

This is expected behavior in the event that a disk subsystem is powered off separately from a host. To return the logical drives to their operating status, reapply power to the disk subsystem.


Recovering From a Disk Drive Failure

When a disk drive fails for any reason, it is represented in the Sun StorageTek RAID Manager software with a red X.


This section explains how to recover when a disk drive fails:

Failed Disk Drive Protected by a Hot-Spare

When a logical drive is protected by a hot-spare, if a disk drive in that logical drive fails the hot-spare is automatically incorporated into the logical drive and takes over for the failed drive.

For instance, when a disk drive fails in the RAID 5 logical drive, the logical drive is automatically rebuilt (its data is reconstructed) using the hot-spare in place of the failed drive. You cannot access the logical drive until the rebuilding is complete.



Note - In this example, the color of the hot-spare changed from light-blue to dark-blue, showing that it is now part of a logical drive.



Screen shot depicting what occurs when a disk drive fails in a RAID 5 logical drive.


procedure icon  To Recover From the Failure

1. Remove and replace the failed disk drive (following the manufacturer instructions).

2. If copyback is not enabled, do the following:

a. Remove the ‘hot spare’ designation from the original hot-spare (the disk drive that was built into the logical drive).

See To Remove or Delete a Dedicated Hot-Spare for instructions.

b. Designate a new hot-spare to protect the logical drives on that HBA.

3. If copyback is enabled, no action is required.

Data is automatically moved back to its original location once the HBA detects that the failed drive has been replaced.

See To Enable Copyback for more information.

Failed Disk Drive Not Protected By a Hot-Spare

When a logical drive is not protected by a hot-spare, if a disk drive in that logical drive fails, remove and replace the failed disk drive. The HBA detects the new disk drive and begins to rebuild the logical drive.

For instance, when one of the disk drives fails in the RAID 1 logical drive shown in the next example, the logical drive is not automatically rebuilt. The failed disk drive must be removed and replaced before the logical drive can be rebuilt.


procedure icon  To Recover From the Failure

1. If the HBA fails to rebuild the logical drive, check that the cables, disk drives, and HBAs are properly installed and connected.

2. If necessary, follow the instructions in Rebuilding Logical Drives.

Failure in Multiple Logical Drives Simultaneously

If a disk drive fails in more than one logical drive at the same time (one failure per logical drive), and the logical drives have hot-spares protecting them, the HBA rebuilds the logical drives with these limitations:


procedure icon  To Troubleshoot the Failures

single-step bullet  If there are more disk drive failures than hot-spares, see Failed Disk Drive Not Protected By a Hot-Spare.

single-step bullet  If copyback is enabled, data is moved back to its original location once the HBA detects that the failed drive has been replaced.

See To Enable Copyback for more information.

Disk Drive Failure in a RAID 0 Logical Drive

Because RAID 0 volumes do not include redundancy, if a disk drive fails in a RAID 0 logical drive, the data cannot be recovered.

Correct the cause of the failure or replace the failed disk drives. Then, restore your data (if available).

Multiple Failures in the Same Logical Drive

Except in RAID 6 and RAID 60 logical drives (see RAID 6 Logical Drives), if more than one disk drive fails at the same time in the same logical drive, the data cannot be recovered.

Correct the cause of the failure or replace the failed disk drives. Then, restore your data (if available).



Note - In some instances, RAID 10 and RAID 50 logical drives may survive multiple disk drive failures, depending on which disk drives fail. See Selecting the Best RAID Level for more information.


Removing the Icon of a Failed Disk Drive



Note - You can only complete this task on disk drives that are not included in any logical drive.


When a disk drive fails, it may still be displayed in the Sun StorageTek RAID Manager software although it is no longer available. To see an accurate representation of your storage space and make it easier to monitor your disk drives, you can remove a failed disk drive from the Physical Devices View.

In the Physical Devices View, right-click the failed disk drive, then click Remove failed drive.


Understanding Hot-Plug Limitations and Conditions

Hot-plugging of hard disk enclosures is not supported from the Sun StorageTek RAID Manager graphical user interface (GUI). However, hot-plugging of SAS/SATA hard disk drives (HDDs) is supported through the GUI, but only within hard disk enclosures under the following conditions:

Hot-Unplug Removal Conditions

Hot-unplug, removal, of HDDs is supported under the following conditions:

Hot-Plug Addition Conditions

Hot-plug, add, of HDDs is supported under the following conditions:

Hot-Unplug and Plug Replacement/Reinsertion Conditions

Hot unplug and plug, replace/reinsert, of HDDs is supported under the following conditions:

a. Remove the selected hard disk drive.

b. Confirm that the GUI detects and displays the new configuration.

c. Replace/reinsert the hard disk (new or same) into an enclosure slot (same or another unused slot).

d. Confirm that the GUI detects and displays the new configuration.


Rebuilding Logical Drives

A hot-swap rebuild occurs when an HBA detects that a failed disk drive in a logical drive has been removed and then reinserted.


procedure icon  To Start a Hot-Swap Rebuild

1. Following the manufacturer instructions, gently pull the failed disk drive from the server without fully removing it.

2. Wait for the disk drive to spin down fully before continuing.

3. If there is nothing wrong with the disk drive, reinstall it, following the manufacturer instructions.

If necessary, replace the failed disk drive with a new disk drive of equal or larger size.

4. The HBA detects the reinserted (or new) disk drive and begins to rebuild the logical drive.


Solving Notification Problems

To test notifications on your storage space, you can send test events or E-mails to ensure that they are being received properly.


procedure icon  To Troubleshoot a Failed Test Event

1. Ensure that the remote system is powered on and running the Sun StorageTek RAID Manager software.

2. Open the remote system’s System Properties window (see Step 3 in To Modify System Information) and double-check the TCP/IP address and port number.

3. Try sending the test event again.

If the test E-mail fails:

a. Ensure that the E-mail address of the recipient is correct.

See To Modify Information About a Recipient to modify the address.

b. Ensure that the SMTP server address is correct.

See To Change the E-mail Notification Manager Settings to modify the address.

c. Try sending the test message again.


Creating a Support Archive File

Your Sun StorageTek RAID Manager software service representative might ask you to create a configuration and status information archive file to help diagnose a problem with your system.


procedure icon  To Create the Archive File

1. In the Enterprise View, click the local or remote system on which the problem is occurring.

2. In the menu bar, select Actions, then click Save support archive.

3. Enter a name for the archive file or accept the default name, then click Save.


Understanding Error and Warning Messages

This section provides detailed information about error and warning events that occur in the Sun StorageTek RAID Manager software.

Warning Messages


TABLE 13-1 Warning Messages

Warning

Warning Message Text

ArrayCritical

Ready disk drives are still available

HotSpareTooSmall

The hot-spare is too small to protect the specified array

HotSpareWontWork

At least one logical drive is not protected by the specified hot-spare

InitLD

Hot-spare is too small for use by at least one array

NoService

The specified logical drive was not initialized

SyncLD

Could not contact the Sun StorageTek RAID Manager Agent. The Sun StorageTek RAID Manager software may not function correctly. Please start the Agent.


Error Messages


TABLE 13-2 Error Messages

Error

Error Message Text

AbortTask

Could not stop the specified currently running task

AccessControl

Could not write the logical drive access control list

AddToDiskSet

Could not add drives to the specified diskset

AgentRemoved

Could not remove the specified Agent

ArrayInUse

Could not delete the specified array. One or more initiators are logged into a logical drive(s) contained within this array

ArraysInUse

Could not delete all of the specified arrays. One or more initiators are logged into a logical drive(s) contained within this array

BreakRemoteMirror

Could not break the specified remote mirror facet

CalibrateBatteryController

Could not recalibrate the specified battery

ChangeArraylName

Could not change the name of the specified array

ChangeBIOSMode

Could not change the BIOS-compatibility mapping

ChangeDiskSetName

Could not change the name of diskset

ChangeLogicalLun

Could not change the LUN of the specified logical drive

ChangeLogicalName

Could not change the name of the specified logical drive

ChangeNtpServer

Could not update the specified NTP server

ChangeTimeDate

Could not change the date and time

ChgAlarm

Could not change the alarm setting

ChgDataScrubRate

Could not change the background consistency check rate

ChgRebuildRate

Could not change the rebuild rate

ChgSCSIXferSpeed

Could not change the SCSI transfer speed

ChgStripeSize

Could not change the specified stripe size

ChgTaskPriority

Could not change task priority

ClearAdapterLogsFail

Could not clear the event logs for the specified system

ClearEnclosureLogsFail

Could not clear the event logs for specified enclosure

ClearHardDrive

Clear failed to start for the specified disk drive

CommFailure

You must re-establish communication with specified system

CommFailure1

Restart the Sun StorageTek RAID Manager Agent to establish communication with the local system

ControllerRescan

Could not rescan for the specified controller

ControllerRestart

Could not restart the specified controller

ControllerShutDown

Could not shut down the specified controller

CreateDiskSet

Could not create the diskset

CreateLDError

There was an error creating specified logical drive

CreateSimpleVolume

Could not create a simple volume

DataScrub

Could not change the background consistency check mode

DDDAdInternal

Failed drive--Controller internal failure

DDDDeviceNotFound

Failed drive--Device not found

DDDDeviceNotReady

Failed drive--Specified device will not come ready

DDDDriveAddedToSystem

Failed drive--Specified disk drive added to server

DDDDriveNotBelong1

Failed drive--Specified disk drive does not belong

DDDDriveNotBelong2

Failed drive--Specified disk drive does not belong

DDDDriveNotFound

Failed drive--Specified disk drive not found

DDDDriveNotPartOfCluster

Failed drive--Specified disk drive is not part of the cluster

DDDHardwareError

Failed drive--Internal hardware error

DDDInternalHW

Failed drive--Internal hardware error

DDDIOSubSystem1

Failed drive--I/O subsystem error

DDDIOSubSystem2

Failed drive--I/O subsystem error

DDDIOSubSystem3

Failed drive--I/O subsystem error

DDDSCSI1

Failed drive--SCSI error

DDDSCSI2

Failed drive--SCSI error

DDDSCSI3

Failed drive--SCSI error

DDDSCSIBusParity

Failed drive--SCSI bus parity error

DDDSCSIBusTest

Failed drive--SCSI bus test error

DDDSCSIChanNotOperational

Failed drive--SCSI channel is not operational

DDDSCSIErrUnknown

Failed drive--Unknown SCSI error

DDDUnknownDriveFound

Failed drive--Unknown disk drive on controller

DDDUnknownDriveInCluster

Failed drive--Unknown disk drive in cluster

DDDUnknownSASError

Failed drive--Unknown SAS error

DDDUserAcceptedInitChange

Failed drive--User accepted

DDDUserMarked

Failed drive--User marked 'failed'

DDDUserMarkedFailed

Failed drive--User marked 'failed'

DeleteArray

Could not delete the specified array

DeleteArrays

Could not delete all of the specified arrays

DeleteDiskSet

Could not delete the diskset

DeleteHArray

Could not delete the specified spanned array

DeleteLogDrive

Could not delete the specified logical drive

DisCopyBackMode

Could not disable copy back mode

DisReadCache

Could not disable read cache

DisUnattendedMode

Could not disable unattended mode

DisWriteCache

Could not disable write cache

EnclosureRestart

Could not restart the specified enclosure

EnclosureShutDown

Could not shut down the specified enclosure

EnCopyBackMode

Could not enable copy back mode

EnReadCache

Could not enable read cache

EnUnattendedMode

Could not enable unattended mode

EnWriteCache

Could not enable write cache

EventNotSent

Could not send the event to the system

ExportedArray

Could not export the specified array

FactoryDefault

Could not restore the configuration to the factory-default settings

FailbackDiskSet

Could not move diskset

FailedAtPort

The Sun StorageTek RAID Manager software failed to start at specified port number

FailedSelfTest

Specified self-test problem code was returned from specified controller, channel, SCSI ID, S/N

FailedSelfTestStart

One or more of the selected disk drives failed to execute the self-test. View the RaidErrA.log file on the Sun StorageTek RAID Manager Agent for details

FailedToConnect

Failed to connect to specified host name at specified port number

FailedToReadNOT

Failed to read the notification list file

FailedToReadSEC

Failed to read the user accounts file

FailIncompatible

Failed to connect to the specified host name due to incompatible software versions

FailOver

Could not fail from the active device to the passive device

FailoverDiskSet

Could not move diskset

HostList

Could not write the host initiator list

HotSwap

Could not enable the automatic rebuild on replacement operation

ImageSelect

Could not change the firmware to the specified boot image

ImportConfig

Could not copy the configuration from the specified drives

ImportedArray

Could not import the specified array

IncreaseLogDrive

Could not increase the size of the specified logical drive

InitHardDrive

Could not initialize the specified disk drive

InitLogDrive

Could not initialize the specified logical drive

KillOtherController

Could not kill other controller

LDM

Could not start the specified logical drive reconfiguration

LogIn

The user could not be logged in

LogOut

The user could not be logged out

MaybeReadCache

Could not set read cache mode to 'enabled when protected by battery'

MaybeWriteCache

Could not set write cache mode to 'enabled when protected by battery'

MergeOwnNS

Could not copy the configuration from the non-shared logical drives

Rebuild

Could not set the drive to the specified rebuild state

RemoveAHS

Could not delete the dedicated hot-spare drive

RemoveFromDiskSet

Could not remove drives from the specified diskset

RemoveSHS

Could not delete the specified standby hot-spare drive

ReplaceDHS

Could not replace the specified failed drive

RollbackSnapshot

Could not rollback the specified snapshot

ScanDrives

Could not perform the bus rescan

SetArrayOnline

Could not send the Array Optimal command to the specified controller

SetChannelInitiatorId

Could not set the specified SCSI initiator ID

SetContDiskCachePolicy

Could not change the specified global drive cache policy

SetHostId

Could not set the specified controller name

SetITNexusLossTime

Could not change I_T nexus loss time

SetMergeGroup

Could not set the specified merge-group number

SetPartnerId

Could not set the specified partner controller name

SetSpareSet

Could not change the specified spare set attribute

SetToAHotSpare

Could not create a dedicated hot-spare drive

SetToDefunct

Could not set the specified drive to failed

SetToEmpty

Could not remove the specified failed drive

SetToHotSpare

Could not create a hot-spare drive

SetToOnline

Could not set the specified failed drive to optimal

SetToSHotSpare

Could not create a standby hot-spare drive

SetWce

Could not change the write-cache mode

SyncArray

Could not start the array verify

SyncLogDrive

Could not start the logical drive verify

TargetInfo

Could not write the logical drive target information

Unblock

Could not unblock the specified logical drive

UnkillOtherController

Could not unkill other controller

UserAccounts

Could not write the target user account list

VerifyArray

Could not start the array verify

VerifyFixHardDrive

Verify with fix failed to start

VerifyHardDrive

Verify failed to start

VolumeInUse

Could not delete the specified logical drive. One or more initiators are logged into the logical drive.