This chapter provides troubleshooting information about the software and the storage space. The chapter contains the following sections:
Troubleshooting Potential Software Issues
If you experience problems installing or using the Sun StorageTek RAID Manager software, follow these suggestions:
- Ensure that you are logged into the Sun StorageTek RAID Manager software at the permission level you need to perform the tasks you want. (See Understanding Permission Levels for more information.)
- Ensure that all managed systems are powered on and that you are logged in to any remote systems you want to manage. (See Understanding Permission Levels for more information.)
- Check all cable connections.
- Try uninstalling and reinstalling the Sun StorageTek RAID Manager software.
Identifying a Failed or Failing Component
When a Warning- or Error-level event occurs, use the rapid fault isolation feature of the Sun StorageTek RAID Manager software to quickly identify the source of the problem.
For instance, in this example, a disk drive has failed. To find the failed disk drive, follow the yellow Error icons.
FIGURE 12-1 Using Icons to Identify Failures
Recovering From a Disk Drive Failure
When a disk drive fails for any reason, it is represented in the Sun StorageTek RAID Manager software with a red X.
This section explains how to recover when a disk drive fails:
Failed Disk Drive Protected by a Hot-Spare
When a logical drive is protected by a hot-spare, if a disk drive in that logical drive fails, the hot-spare is automatically incorporated into the logical drive and takes over for the failed drive.
For instance, when a disk drive fails in the RAID 5 logical drive, the logical drive is automatically rebuilt (its data is reconstructed) using the hot-spare in place of the failed drive.
Note - In this example, the color of the hot-spare changed from light-blue to dark-blue, showing that it is now part of a logical drive.
|
To Recover From the Failure
|
1. Remove and replace the failed disk drive (following the manufacturer instructions).
2. If copyback is not enabled, do the following:
a. Remove the ‘hot spare’ designation from the original hot-spare (the disk drive that was built into the logical drive).
See To Remove or Delete a Dedicated Hot-Spare for instructions.
b. Designate a new hot-spare to protect the logical drives on that HBA.
3. If copyback is enabled, no action is required.
Data is automatically moved back to its original location once the HBA detects that the failed drive has been replaced.
See To Enable Copyback for more information.
Failed Disk Drive Not Protected By a Hot-Spare
When a logical drive is not protected by a hot-spare, if a disk drive in that logical drive fails, remove and replace the failed disk drive. The HBA detects the new disk drive and begins to rebuild the logical drive.
For instance, when one of the disk drives fails in the RAID 1 logical drive shown in the next example, the logical drive is not automatically rebuilt. The failed disk drive must be removed and replaced before the logical drive can be rebuilt.
To Recover From the Failure
|
1. If the HBA fails to rebuild the logical drive, check that the cables, disk drives, and HBAs are properly installed and connected.
2. If necessary, follow the instructions in Rebuilding Logical Drives.
Failure in Multiple Logical Drives Simultaneously
If a disk drive fails in more than one logical drive at the same time (one failure per logical drive), and the logical drives have hot-spares protecting them, the HBA rebuilds the logical drives with these limitations:
- A hot-spare must be of equal or greater size than the failed disk drive it’s replacing.
- Failed disk drives are replaced with hot-spares in the order in which they failed. (The logical drive that includes the disk drive that failed first is rebuilt first, assuming an appropriate hot-spare is available--see the previous bullet.)
To Troubleshoot the Failures
|
If there are more disk drive failures than hot-spares, see Failed Disk Drive Not Protected By a Hot-Spare.
If copyback is enabled, data is moved back to its original location once the HBA detects that the failed drive has been replaced.
See To Enable Copyback for more information.
Disk Drive Failure in a RAID 0 Logical Drive
Because RAID 0 volumes do not include redundancy, if a disk drive fails in a RAID 0 logical drive, the data can’t be recovered.
Correct the cause of the failure or replace the failed disk drives. Then, restore your data (if available).
Multiple Failures in the Same Logical Drive
Except in RAID 6 and RAID 60 logical drives (see RAID 6 Logical Drives), if more than one disk drive fails at the same time in the same logical drive, the data can’t be recovered.
Correct the cause of the failure or replace the failed disk drives. Then, restore your data (if available).
Note - In some instances, RAID 10 and RAID 50 logical drives may survive multiple disk drive failures, depending on which disk drives fail. See Selecting the Best RAID Level for more information.
|
Removing the Icon of a Failed Disk Drive
Note - You can only complete this task on disk drives that are not included in any logical drive.
|
When a disk drive fails, it may still be displayed in the Sun StorageTek RAID Manager software although it is no longer available. To see an accurate representation of your storage space and make it easier to monitor your disk drives, you can remove a failed disk drive from the Physical Devices View.
In the Physical Devices View, right-click the failed disk drive, then click Remove failed drive.
Understanding Hot-Plug Limitations and Conditions
Hot-plugging of hard disk enclosures is not supported from the Sun StorageTek RAID Manager graphical user interface (GUI). However, hot-plugging of SAS/SATA hard disk drives (HDDs) is supported through the GUI, but only within hard disk enclosures under the following conditions:
Hot-Unplug Removal Conditions
Hot-unplug, removal, of HDDs is supported under the following conditions:
- After the HDDs are removed, you must wait until the configuration change is detected and displayed within the GUI before performing any additional action to the new physical device configuration of the HBA.
- You can continue to configure the storage space.
Hot-Plug Addition Conditions
Hot-plug, add, of HDDs is supported under the following conditions:
- After all HDDs are added to the enclosure, you must wait until the configuration change is detected and displayed within the GUI before performing any additional action to the new physical device configuration of the HBA.
- You can continue to configure the storage space.
Hot-Unplug and Plug Replacement/Reinsertion Conditions
Hot unplug and plug, replace/reinsert, of HDDs is supported under the following conditions:
- If a hard disk drive is to be removed and replaced either into the same slot or a different unused slot using the same disk drive or a new disk drive, you must wait until the configuration change is detected and displayed within the GUI before performing any additional action to the new physical device configuration of the HBA:
a. Remove the selected hard disk drive.
b. Confirm that the GUI detects and displays the new configuration.
c. Replace/reinsert the hard disk (new or same) into an enclosure slot (same or another unused slot).
d. Confirm that the GUI detects and displays the new configuration.
- You can continue to configure the storage space.
Rebuilding Logical Drives
A hot-swap rebuild occurs when an HBA detects that a failed disk drive in a logical drive has been removed and then reinserted.
To Start a Hot-Swap Rebuild
|
1. Following the manufacturer instructions, gently pull the failed disk drive from the server without fully removing it.
2. Wait for the disk drive to spin down fully before continuing.
3. If there is nothing wrong with the disk drive, reinstall it, following the manufacturer instructions.
If necessary, replace the failed disk drive with a new disk drive of equal or larger size.
4. The HBA detects the reinserted (or new) disk drive and begins to rebuild the logical drive.
Solving Notification Problems
To test notifications on your storage space, you can send test events or emails to ensure that they’re being received properly.
To Troubleshoot a Failed Test Event
|
1. Ensure that the remote system is powered on and running the Sun StorageTek RAID Manager software.
2. Open the remote system’s System Properties window (see Step 3) and double-check the TCP/IP address and port number.
3. Try sending the test event again.
If the test email fails:
a. Ensure that the email address of the recipient is correct.
See To Modify Information About a Recipient to modify the address.
b. Ensure that the SMTP server address is correct.
See To Change the Email Notification Manager Settings to modify the address.
c. Try sending the test message again.
Creating a Support Archive File
Your Sun StorageTek RAID Manager software service representative might ask you to create a configuration and status information archive file to help diagnose a problem with your system.
To Create the Archive File
|
1. In the Enterprise View, click the local or remote system on which the problem is occurring.
2. In the menu bar, select Actions, then click Save support archive.
3. Enter a name for the archive file or accept the default name, then click Save.
Understanding Error and Warning Messages
This section provides detailed information about error and warning events that occur in the Sun StorageTek RAID Manager software.
Warning Messages
TABLE 12-1 Warning Messages
Warning
|
Warning Message Text
|
ArrayCritical
|
Ready disk drives are still available
|
HotSpareTooSmall
|
The hot-spare is too small to protect the specified array
|
HotSpareWontWork
|
At least one logical drive is not protected by the specified hot-spare
|
InitLD
|
Hot-spare is too small for use by at least one array
|
NoService
|
The specified logical drive was not initialized
|
SyncLD
|
Could not contact the Sun StorageTek RAID Manager Agent. The Sun StorageTek RAID Manager software may not function correctly. Please start the Agent.
|
Error Messages
TABLE 12-2 Error Messages
Error
|
Error Message Text
|
AbortTask
|
Could not stop the specified currently running task
|
AccessControl
|
Could not write the logical drive access control list
|
AddToDiskSet
|
Could not add drives to the specified diskset
|
AgentRemoved
|
Could not remove the specified Agent
|
ArrayInUse
|
Could not delete the specified array. One or more initiators are logged into a logical drive(s) contained within this array
|
ArraysInUse
|
Could not delete all of the specified arrays. One or more initiators are logged into a logical drive(s) contained within this array
|
BreakRemoteMirror
|
Could not break the specified remote mirror facet
|
CalibrateBatteryController
|
Could not recalibrate the specified battery
|
ChangeArraylName
|
Could not change the name of the specified array
|
ChangeBIOSMode
|
Could not change the BIOS-compatibility mapping
|
ChangeDiskSetName
|
Could not change the name of diskset
|
ChangeLogicalLun
|
Could not change the LUN of the specified logical drive
|
ChangeLogicalName
|
Could not change the name of the specified logical drive
|
ChangeNtpServer
|
Could not update the specified NTP server
|
ChangeTimeDate
|
Could not change the date and time
|
ChgAlarm
|
Could not change the alarm setting
|
ChgDataScrubRate
|
Could not change the background consistency check rate
|
ChgRebuildRate
|
Could not change the rebuild rate
|
ChgSCSIXferSpeed
|
Could not change the SCSI transfer speed
|
ChgStripeSize
|
Could not change the specified stripe size
|
ChgTaskPriority
|
Could not change task priority
|
ClearAdapterLogsFail
|
Could not clear the event logs for the specified system
|
ClearEnclosureLogsFail
|
Could not clear the event logs for specified enclosure
|
ClearHardDrive
|
Clear failed to start for the specified disk drive
|
CommFailure
|
You must re-establish communication with specified system
|
CommFailure1
|
Restart the Sun StorageTek RAID Manager Agent to establish communication with the local system
|
ControllerRescan
|
Could not rescan for the specified controller
|
ControllerRestart
|
Could not restart the specified controller
|
ControllerShutDown
|
Could not shut down the specified controller
|
CreateDiskSet
|
Could not create the diskset
|
CreateLDError
|
There was an error creating specified logical drive
|
CreateSimpleVolume
|
Could not create a simple volume
|
DataScrub
|
Could not change the background consistency check mode
|
DDDAdInternal
|
Failed drive--Controller internal failure
|
DDDDeviceNotFound
|
Failed drive--Device not found
|
DDDDeviceNotReady
|
Failed drive--Specified device will not come ready
|
DDDDriveAddedToSystem
|
Failed drive--Specified disk drive added to server
|
DDDDriveNotBelong1
|
Failed drive--Specified disk drive does not belong
|
DDDDriveNotBelong2
|
Failed drive--Specified disk drive does not belong
|
DDDDriveNotFound
|
Failed drive--Specified disk drive not found
|
DDDDriveNotPartOfCluster
|
Failed drive--Specified disk drive is not part of the cluster
|
DDDHardwareError
|
Failed drive--Internal hardware error
|
DDDInternalHW
|
Failed drive--Internal hardware error
|
DDDIOSubSystem1
|
Failed drive--I/O subsystem error
|
DDDIOSubSystem2
|
Failed drive--I/O subsystem error
|
DDDIOSubSystem3
|
Failed drive--I/O subsystem error
|
DDDSCSI1
|
Failed drive--SCSI error
|
DDDSCSI2
|
Failed drive--SCSI error
|
DDDSCSI3
|
Failed drive--SCSI error
|
DDDSCSIBusParity
|
Failed drive--SCSI bus parity error
|
DDDSCSIBusTest
|
Failed drive--SCSI bus test error
|
DDDSCSIChanNotOperational
|
Failed drive--SCSI channel is not operational
|
DDDSCSIErrUnknown
|
Failed drive--Unknown SCSI error
|
DDDUnknownDriveFound
|
Failed drive--Unknown disk drive on controller
|
DDDUnknownDriveInCluster
|
Failed drive--Unknown disk drive in cluster
|
DDDUnknownSASError
|
Failed drive--Unknown SAS error
|
DDDUserAcceptedInitChange
|
Failed drive--User accepted
|
DDDUserMarked
|
Failed drive--User marked 'failed'
|
DDDUserMarkedFailed
|
Failed drive--User marked 'failed'
|
DeleteArray
|
Could not delete the specified array
|
DeleteArrays
|
Could not delete all of the specified arrays
|
DeleteDiskSet
|
Could not delete the diskset
|
DeleteHArray
|
Could not delete the specified spanned array
|
DeleteLogDrive
|
Could not delete the specified logical drive
|
DisCopyBackMode
|
Could not disable copy back mode
|
DisReadCache
|
Could not disable read cache
|
DisUnattendedMode
|
Could not disable unattended mode
|
DisWriteCache
|
Could not disable write cache
|
EnclosureRestart
|
Could not restart the specified enclosure
|
EnclosureShutDown
|
Could not shut down the specified enclosure
|
EnCopyBackMode
|
Could not enable copy back mode
|
EnReadCache
|
Could not enable read cache
|
EnUnattendedMode
|
Could not enable unattended mode
|
EnWriteCache
|
Could not enable write cache
|
EventNotSent
|
Could not send the event to the system
|
ExportedArray
|
Could not export the specified array
|
FactoryDefault
|
Could not restore the configuration to the factory-default settings
|
FailbackDiskSet
|
Could not move diskset
|
FailedAtPort
|
The Sun StorageTek RAID Manager software failed to start at specified port number
|
FailedSelfTest
|
Specified self-test problem code was returned from specified controller, channel, SCSI ID, S/N
|
FailedSelfTestStart
|
One or more of the selected disk drives failed to execute the self-test. View the RaidErrA.log file on the Sun StorageTek RAID Manager Agent for details
|
FailedToConnect
|
Failed to connect to specified host name at specified port number
|
FailedToReadNOT
|
Failed to read the notification list file
|
FailedToReadSEC
|
Failed to read the user accounts file
|
FailIncompatible
|
Failed to connect to the specified host name due to incompatible software versions
|
FailOver
|
Could not fail from the active device to the passive device
|
FailoverDiskSet
|
Could not move diskset
|
HostList
|
Could not write the host initiator list
|
HotSwap
|
Could not enable the automatic rebuild on replacement operation
|
ImageSelect
|
Could not change the firmware to the specified boot image
|
ImportConfig
|
Could not copy the configuration from the specified drives
|
ImportedArray
|
Could not import the specified array
|
IncreaseLogDrive
|
Could not increase the size of the specified logical drive
|
InitHardDrive
|
Could not initialize the specified disk drive
|
InitLogDrive
|
Could not initialize the specified logical drive
|
KillOtherController
|
Could not kill other controller
|
LDM
|
Could not start the specified logical drive reconfiguration
|
LogIn
|
The user could not be logged in
|
LogOut
|
The user could not be logged out
|
MaybeReadCache
|
Could not set read cache mode to 'enabled when protected by battery'
|
MaybeWriteCache
|
Could not set write cache mode to 'enabled when protected by battery'
|
MergeOwnNS
|
Could not copy the configuration from the non-shared logical drives
|
Rebuild
|
Could not set the drive to the specified rebuild state
|
RemoveAHS
|
Could not delete the dedicated hot-spare drive
|
RemoveFromDiskSet
|
Could not remove drives from the specified diskset
|
RemoveSHS
|
Could not delete the specified standby hot-spare drive
|
ReplaceDHS
|
Could not replace the specified failed drive
|
RollbackSnapshot
|
Could not rollback the specified snapshot
|
ScanDrives
|
Could not perform the bus rescan
|
SetArrayOnline
|
Could not send the Array Optimal command to the specified controller
|
SetChannelInitiatorId
|
Could not set the specified SCSI initiator ID
|
SetContDiskCachePolicy
|
Could not change the specified global drive cache policy
|
SetHostId
|
Could not set the specified controller name
|
SetITNexusLossTime
|
Could not change I_T nexus loss time
|
SetMergeGroup
|
Could not set the specified merge-group number
|
SetPartnerId
|
Could not set the specified partner controller name
|
SetSpareSet
|
Could not change the specified spare set attribute
|
SetToAHotSpare
|
Could not create a dedicated hot-spare drive
|
SetToDefunct
|
Could not set the specified drive to failed
|
SetToEmpty
|
Could not remove the specified failed drive
|
SetToHotSpare
|
Could not create a hot-spare drive
|
SetToOnline
|
Could not set the specified failed drive to optimal
|
SetToSHotSpare
|
Could not create a standby hot-spare drive
|
SetWce
|
Could not change the write-cache mode
|
SyncArray
|
Could not start the array verify
|
SyncLogDrive
|
Could not start the logical drive verify
|
TargetInfo
|
Could not write the logical drive target information
|
Unblock
|
Could not unblock the specified logical drive
|
UnkillOtherController
|
Could not unkill other controller
|
UserAccounts
|
Could not write the target user account list
|
VerifyArray
|
Could not start the array verify
|
VerifyFixHardDrive
|
Verify with fix failed to start
|
VerifyHardDrive
|
Verify failed to start
|
VolumeInUse
|
Could not delete the specified logical drive. One or more initiators are logged into the logical drive.
|
Sun StorageTek RAID Manager Software User’s Guide
|
820-1177-13
|
|
Copyright © 2009 Sun Microsystems, Inc. All rights reserved.