C H A P T E R  7

Maintaining the Array

This chapter explains how to maintain the integrity of the array using Array Administration. Topics covered in this chapter are as follows:


Array Administration Activities

Array administration activities such as initialization, drive rebuilding, and
parity checking can take some time, depending on the size of the logical drive or physical drives involved.


After one of these processes has started, the Controller Array Progress window is displayed. If you close the window, to view progress, click the Progress Indicator
icon Progress Indicator iconor choose View right arrowArray Admin Progress. You can stop any of these processes at any time by clicking Abort.

Screen capture showing the Controller Array Progress window.


procedure icon  To Check Parity

Parity checking is the process where the integrity of redundant data on fault-tolerant logical drives (RAID 1, 3, and 5) is checked. Depending on which options you select, the parity can be overwritten and any error reported as an event.

1. Select the logical drive on which you want to run parity check.

2. Choose Array Administration right arrow Parity Check.



Note - You need to be logged in as either ssadmin or ssconfig to access options on this menu.


3. When the Logical Drive Parity Check window is displayed, select the logical drive on which you want to run a parity check.


Screen capture showing the Logical Drive Parity Check window.

To run a parity check on multiple drives, choose the Array Administration right arrow Schedule Parity Check command to schedule a parity check to be run in the near future (such as within three minutes). When scheduled parity check runs, it automatically performs the parity checks one after another.

4. Select from the following options:



caution icon Caution - If an array’s data parity is seriously damaged, restoring data by regenerating might cause data loss. Only select Regenerate after you have performed any necessary data recovery based on the parity check errors.




Note - If you select Regenerate, make sure that Generate Error Event is also selected so that if inconsistent parity is encountered, bad blocks are specified.


5. Click the Parity Check button to start the parity check process.

Once a parity check has started, the Progress Indicator is automatically displayed. If this window is closed, it can be reopened by choosing View right arrow Array Admin Progress or by clicking the Progress Indicator icon. A window is displayed that shows the percentage of completion progress for each array.

To stop the parity check, click Cancel.


procedure icon  To Schedule a Parity Check

Choose Array Administration right arrow Schedule Parity Check to check parity of a specific logical drive array at scheduled intervals (for example, during off hours).



Note - You need to be logged in as either ssadmin or ssconfig to access options on this menu.


1. Select the controller on which you want to schedule the parity check.

2. Choose Array Administration right arrow Schedule Parity Check.

The Schedule Parity Check window is displayed.


Screen capture showing the Schedule Parity Check window.

3. Make selections in the appropriate fields on this window.



caution icon Caution - If an array’s data parity is seriously damaged, restoring data by regenerating might cause data loss. Only select Regenerate after you have performed any necessary data recovery based on the parity check errors.




Note - If you select Regenerate, make sure that Generate Error Event is also selected so that if inconsistent parity is encountered, bad blocks are specified.


4. When you are satisfied with the schedule, click OK.


procedure icon  To Scan Physical Disks for Bad Blocks (Media Scan)

The media scan feature sequentially checks each physical drive in a selected logical drive, block by block, for bad blocks. If a bad block is encountered, the controller rebuilds the data from the bad block onto a good block if one is available on the physical drive. If no good blocks are available on the physical drive, the controller designates the physical drive “Bad,” generates an event message, and if a spare drive is available, begins rebuilding data from the bad physical drive onto the spare.



Note - A firmware menu option called Media Scan at Power-Up specifies whether media scan runs automatically following a controller power-cycle, reset, or after logical drive initialization. This setting is disabled by default. For more information, refer to the Sun StorEdge 3000 Family RAID Firmware User’s Guide.


If you have disabled or stopped the automatic continuous media scan, you can start a media scan manually on a logical drive or a single physical drive that makes up a logical drive. It is useful to run a media scan if a drive has failed, if drive errors are encountered, or when a rebuild is required after replacing a drive.

1. Select a logical drive.

2. Choose Array Administration right arrow Media Scan.

After a few moments, the Media Scan window is displayed.


Screen capture showing the Media Scan window with the Logical Drives tab selected.

3. To start a media scan on a logical drive, click the Logical Drives tab, and select the logical drive to scan.

To start a media scan on a physical drive that makes up the logical drive, select the Disks tab, and select the physical drive to scan.


Screen capture showing the Media Scan window with the Disks tab selected.

4. Select a Media Scan Priority:

5. Select an Iteration Count to specify whether the physical drives are to be checked one time or continuously.

Single time is the default value.

6. Click Run Media Scan, and click OK to continue.



Note - If a media scan is already running, the Run Media Scan button is unavailable.


7. Click Close on the Starting Array Administration window.

The scan progress is displayed in the Controller Array Progress window.


Screen capture showing the progress of a media scan.

Depending on the size of the logical drive and the number of physical drives it contains, the scanning process might take some time to complete.

8. When the Controller Array Progress window shows 100% completion, check the event log to determine the condition of the physical disks.

See Event Log for information about viewing the event log.


procedure icon  To Stop a Media Scan on a Logical Drive or Physical Drive

1. Select a logical drive.

2. Choose Array Administration right arrow Media Scan.

After a few moments, the Media Scan window is displayed.

3. To stop a media scan on a logical drive, click the Logical Drives tab, and select the logical drive on which you want to stop the scan.

To stop a media scan on a physical drive that makes up the logical drive, select the Disks tab, and select the physical drive on which you want to stop the scan.

4. Click Abort Media Scan.

5. Click OK to continue.

6. Click Close on the Starting Array Administration window.



Note - To stop a media scan on a physical drive, you can also select Abort for the on the Controller Array Progress window.



Failed Drives

This section contains procedures for recovering from a drive failure with and without a standby drive. If for some reason these procedures do not start the rebuilding process, instructions are also provided for manually starting a rebuild after a drive failure.



caution icon Caution - Be sure to configure a local or global standby drive for each logical drive at the time of initial configuration. Depending on the type of RAID level used and archiving procedure implemented, significant data loss might occur in cases of single or multiple drive failures. Additionally, make tested spare drives readily available on site for immediate replacement if a malfunction occurs.



procedure icon  To Automatically Rebuild a Drive Using a Standby Drive

When a drive associated with a fault-tolerant logical drive fails, and a standby drive has previously been installed and configured as either a global or local spare, the failed drive is automatically substituted and its data rebuilt using the designated spare drive. For this to occur flawlessly, the spare drive’s capacity must always be equivalent to or larger than the failed drive that is being replaced.

The rebuilding process normally starts within one to two minutes. It is performed in the background and takes approximately eight minutes per Gbyte when there is no other activity on the controller.

During the automatic rebuild process, normal activity might continue, although performance might degrade. The degree to which performance degrades is determined by the rebuild priority set for the controller. (To change the rebuild priority, see Disk Array Tab.)

The progress of the rebuild process is displayed when you choose View right arrow Array Admin Progress.

1. Reestablish automatic rebuild capability by replacing the failed drive, using instructions contained in the Sun StorEdge 3000 Family Installation, Operation, and Service Manual for your array.

2. Wait at least 60 seconds after removing the failed drive before inserting a new drive.

Make sure the replacement drive is at least equal to the largest drive in the enclosure. Install the replacement drive in the same slot (drive bay) as the failed drive; the replacement drive then becomes the new standby drive.

3. After the rebuild process is complete and the logical drive is online again, back up the array controller configuration to a file on an external drive or diskette.

See To Save the Logical Drive Configuration.


procedure icon  To Rebuild a Device Without a Standby Drive

If there is no standby drive in the array, you need to replace the failed drive before the automatic rebuild process can start.

1. To recover from a drive failure when there is no standby drive, replace the failed drive by using the instructions contained in the Sun StorEdge 3000 Family Installation, Operation, and Service Manual for your array.

2. Wait at least 60 seconds after removing the failed drive before inserting a new drive.

Make sure the capacity of the replacement drive is at least equal to that of the failed drive. Install the replacement drive at the same address (drive bay) as the failed drive.

3. Once the failed drive is replaced in the same slot, you need to scan it in.

For detailed instructions on scanning in a drive, see To Scan in New Hard Drives (SCSI only).

4. After the drive has been scanned, you need to manually rebuild it by choosing Array Administration right arrow Rebuild.


procedure icon To Check the Progress of the Rebuilding Process

1. Choose View right arrowArray Admin Progressor click the Progress Indicator icon in the upper right corner of the window Progress Indicator icon.

The Controller Array Progress window is displayed that shows the completion percentage of the rebuild. However, if there are activities (such as initialization, rebuild, or parity check) occurring on multiple controllers, the Select Controller Progress window is displayed first.

2. Select the controller whose progress you want to view and click OK.

The Controller Array Progress window is displayed that shows the array progress of the selected controller. For more information, see Array Administration Activities.


procedure icon  To Manually Rebuild a Failed Drive

In most cases, you do not need to use the manual rebuild process because replaced drives are automatically rebuilt.

If a spare is not present when the failure occurs, or for some reason the drive does not rebuild, you can use Rebuild to manually start the rebuild process. Also, if the rebuild process is interrupted by a reset, use Rebuild to restart the rebuilding process.

1. Replace the failed drive, using the instructions contained in the Sun StorEdge 3000 Family Installation, Operation, and Service Manual for your array.

2. Wait at least 60 seconds after removing the failed drive before inserting a new drive.

Make sure the capacity of the replacement drive is at least equal to that of the largest drive in the enclosure.

3. Choose Array Administration right arrow Rebuild.

The Rebuild window is displayed.


Screen capture showing the Rebuild window.

4. Select the status record of the replacement drive.

5. Click Rebuild to start the rebuild process.

The rebuild process is performed in the background and takes approximately eight minutes per Gbyte when there is no other activity on the array controller. During a rebuild, normal activity can continue, although performance might degrade. The degree to which performance degrades is determined by the rebuild priority set for the controller. (To change the rebuild priority, see Disk Array Tab.)

6. To check the progress of the rebuilding process, choose View right arrow Array Admin Progress or click the Progress Indicator icon in the upper right corner of the window.

The Controller Array Progress window is displayed that shows the completion percentage of the rebuild.

If there are array activities (such as initialization, rebuild, or parity check) occurring on multiple controllers, the Select Controller Progress window is displayed first.

7. Select the controller whose progress you want to view and click OK.

The Controller Array Progress window is displayed and shows the array rebuilding status for that controller.


procedure icon  To Restore a Logical Drive Configuration

This section describes how to restore the array configuration information from a backup file. You must have saved a backup file using the Save command as explained in Configuration File. If the array controller and its drives are damaged, you can restore the array configuration to a new controller without having to completely reconfigure the storage array.



caution icon Caution - Restore the array configuration from a file only if the configuration file is current. Data loss will result from restoring an outdated or incorrect configuration.


If you are sure that the backup file contains the correct array configuration information, continue with the following procedure to restore the configuration.

1. Select the controller for the appropriate array.

2. Choose Configuration right arrow Load Configuration.

The Select Configuration File window is displayed.


Screen capture showing the Select Configuration File window.

3. Specify the name and location of the backup configuration file and click Open.

The Load Configuration window is displayed. To see a tree-view representation of the configuration, click the Configuration View tab.


Screen capture showing the Load Configuration window with the Configuration View tab displayed.

The Saveset Description tab displays the description of the file that was specified when the configuration file was created.


Screen capture showing the Load Configuration window with the Saveset Description tab displayed.

4. (Solaris OS only). If you want the logical drive(s) to be automatically labeled, which enables the OS to use the drive, click Write a new label to the new LD.

5. To load the saved configuration, select OK.

The Load Configuration Confirmation window is displayed.

Carefully review the information presented in the Load Configuration Confirmation window before making a decision to continue.


Screen capture showing the Load Configuration Confirmation window.

6. Click Apply to load this configuration or click Cancel to terminate this function.

Apply causes the configuration operation to continue, and a progress window is displayed.



Note - Do not initialize LUN(s) after restoring the array configuration backup file
contents.



Controller Maintenance Options

Controller maintenance options include shutting down the controller, muting the controller beeper, bringing a failed controller back online, displaying performance statistics, and determining controller boot time. Downloading firmware options are also included in the Controller Maintenance Options window. For information on downloading firmware, see Updating the Configuration.


procedure icon  To Reset the Controller

Whenever you make changes to the controller parameters, you are asked if you want to reset the controller so that the changes take effect. If you are making multiple changes, you might not want to stop and reset the controller after each change. Use the Reset the Controller option to manually reset the controller after making multiple parameter changes.

1. Select any storage icon in the main window.

2. Choose Array Administration right arrow Controller Maintenance.

3. If you are not already logged in as ssconfig, a password prompt is displayed; type the ssconfig password.

The Controller Maintenance Options window is displayed.

4. Click Reset the Controller.



Note - Resetting the controller on a Sun StorEdge 3310 SCSI array can result in host- side error messages, such as parity error and synchronous error messages. No action is required and the condition corrects itself as soon as reinitialization of the controller is complete.



procedure icon  To Shut Down the Controller

Whenever the array is powered off, you need to first shut down the controller to ensure that write cache is flushed to disk so that the backup battery (if present) is not drained by the cache memory.



caution icon Caution - Shutting down the controller causes the array to stop responding to I/O requests from the host. This might result in data loss unless all I/O activity is suspended by halting all applications that are accessing the array, and unmounting any file systems that are mounted from the array. In redundant-controller configurations, shutting down the controller affects all LUNs on both controllers.


1. Select any storage icon in the main window.

2. Choose Array Administration right arrow Controller Maintenance.

3. If you are not already logged in as ssconfig, a password prompt is displayed; type the ssconfig password.

The Controller Maintenance Options window is displayed.

4. Click Shut Down the Controller.


procedure icon  To Mute the Controller Beeper

When an event occurs that causes the controller to beep, for example, when a logical drive fails, during a rebuild, or when adding a physical drive, you can mute the controller beeper in one of two ways.

1. Select the controller icon in the main window.

2. Choose Array Administration right arrow Controller Maintenance.

3. If you are not already logged in as ssconfig, a password prompt is displayed; type the ssconfig password.

The Controller Maintenance Options window is displayed.

4. Click Mute Controller Beeper.

or

1. Select the desired controller icon in the main window.

2. Choose Configuration right arrow Custom Configure.

3. Select Change Controller Parameters.

4. Select Mute Beeper.



Note - If the alarm is caused by a failed component, muting the beeper has no effect. You need to push the Reset button on the right ear of the array. See View Enclosure for more information about component failure alarms.



procedure icon  To Bring a Failed Controller Back Online

If a controller fails, bring it back online in one of two ways.

1. Select the controller icon in the main window.

2. Choose Array Administration right arrow Controller Maintenance.

3. If you are not already logged in as ssconfig, a password prompt is displayed; type the ssconfig password.

The Controller Maintenance Options window is displayed.

4. Click Deassert Failed Redundant Controller.

or

1. Select the controller icon in the main window.

2. Choose Configuration right arrow Custom Configure.

3. Select Change Controller Parameters.

4. Select the Redundancy tab.

5. From the Set Controller Config field, select Redundant Deassert Reset.


procedure icon  To Display Performance Statistics

Using Performance Statistics, you can determine the data transfer rate, that is, the speed the array is running at.

1. Choose Array Administration right arrow Controller Maintenance.

2. If you are not already logged in as ssconfig, a password prompt is displayed; type the ssconfig password.

The Controller Maintenance Options window is displayed.

3. Click Performance Statistics.

The Performance Statistics window is displayed.



Note - The Performance Statistics window displays information for the active controller only. The secondary controller does not report statistics to the primary controller due to a limitation of the firmware architecture. This limitation results in the software displaying only statistics for primary LUNs and the cache.



Screen capture showing the Performance Statistics window.


procedure icon  To Get Controller Boot Time

To provide you with a point of reference when investigating controller events, you can determine when the controller was last powered up or reset.

1. Choose Array Administration right arrow Controller Maintenance.

2. If you are not already logged in as ssconfig, a password prompt is displayed; type the ssconfig password.

The Controller Maintenance Options window is displayed.

3. Click Get Controller Boot Time.

The Controller Boot Time window is displayed. The controller date, time, and time zone are set using the firmware application. Refer to the Sun StorEdge 3000 Family RAID Firmware User’s Guide for information about setting the controller date and time.


Screen capture showing the Controller Boot Time window.


procedure icon  To Convert a Dual Controller Array to a Single Controller Array

If one controller fails in a dual array controller configuration, you might want to run a single controller for an extended period of time so that the array does not display as degraded in the console.

1. Make sure you know the serial number of the controller being removed.

You can check the event log for the failed controller’s serial number or check the console and make a note of the primary controller’s serial number.

2. Change the remaining controller’s redundancy setting to disabled.

You must use the firmware application to disable redundancy on the controller. Refer to the Sun StorEdge 3000 Family RAID Firmware User’s Guide for your array for information about accessing the firmware application, and then from the Main Menu, choose “view and edit Peripheral devices right arrow Set Peripheral Device Entry right arrow Redundant Controller - Primary right arrow Disable redundant controller.

3. Stop the agent.

For information about how to stop the agent, see the chapter for your OS in the Sun StorEdge 3000 Family Software Installation Guide.

4. Change to /var/opt/SUNWsscs/ssagent and edit the file sscontlr.txt.

The very last line in the file contains the serial numbers of both controllers. Remove the failed controller’s serial number from this line.


# RAID_CONTROLLER=Enable:3197861:3179746

5. Start the agent as explained in the installation chapter for your OS.

6. Rescan the console if it was open during this procedure.

7. In a single-controller configuration, to avoid the possibility of data corruption, disable Write Back Cache.

See Cache Tab for information on disabling Write Back Cache.