C H A P T E R  6

Disk Control and Monitor Utility (DCMU) for RHEL

This chapter describes how to use the Disk Control and Monitor Utility (DCMU) on a Red Hat Enterprise Linux U4 (RHEL4 U4 or 4.5) 64-bit operating system. It includes the following sections:


Overview of the Disk Control and Monitor Utility for RHEL4 U4

The Disk Control and Monitor Utility (DCMU) controls and monitors all 48-disk drives on the Sun Fire X4500 server and provides the following features:

The Disk Control and Monitor Utility (DCMU) consist of three components. Each component updates the FRU, SDR (Sensor Data Record), SEL (System Event Log) and service processor logs:


DCMU Installation Procedure

To use Disk Control and Monitor Utility (DCMU), you must install the application. To install the application, you must perform the following steps:

Installing DCMU

The installation of DCMU consists of one step because the package is in rpm format. The DCMU package comes with two rpm files. One is the source rpm and other is the binary rpm:


procedure icon  To Install DCMU

single-step bullet  Enter the following command:

# rpm -ivh dcmu-1.3-5.x86_64.rpm

The following files are installed as components of the DCMU installation:

IPMI Service Must be Running to Use DCMU Utilities

The initial installation of the DCMU components prepares the system for running the DCMU utilities described in this chapter. However, since the DCMU utilities also require that the IPMI service is running, you have two options before you can start using the DCMU utilities: Manually start the IPMI service, or reboot the server (which automatically starts faultmond and IPMI).

If rebooting the server after the initial DCMU installation is not possible, and you wish to run DCMU utilities, you must manually start the IPMI service by entering the following command:

# service ipmi start



Note - After the initial installation of DCMU, rebooting the server starts both IPMI and faultmond.


Uninstalling DCMU

If you need to uninstall DCMU, perform the following procedure.


procedure icon  To Uninstall DCMU

single-step bullet  Enter the following command:

# rpm -e dcmu-1.3-5


cfgdisk Command

The cfgdisk command is a command-line utility and that queries and provides status of all 48-disk drives located in the Sun Fire X4500 server. cfgdisk also allows you to connect and disconnect disk drives from the OS and also allows you to monitor disks connected to the server.

cfgdisk Command Options

Use the cfgdisk command to connect, disconnect, and determine disk drive status by using the parameters shown in TABLE 6-1. The following options are supported for the functions shown:


TABLE 6-1 cfgdisk Command Options

Option

Description

-h

Displays help information

-V

Displays utility version information

-o

Connects and disconnects disk drive(s)

-d

Displays disk drive information



Examples Using the cfgdisk Command

This section contains examples of common cfgdisk commands. For more information and options, refer to the cfgdisk man page.

Displaying Disk, Device Nodes, Slots and Status

The following command displays a map of all disk drives:

# cfgdisk

Here is an example of cfgdisk command output listing physical slot number, logical name, and status information:


CODE EXAMPLE 6-1 cfgdisk Command Output
Device                        Slot Number                 Device Node                                                                Status
sata0/0       10           /dev/sda                         Connected
sata0/1       22           /dev/sdl                         Connected
sata0/2       34           /dev/sdx                         Connected
sata0/3       46           /dev/sdam                        Connected
sata0/4       11           /dev/sde                         Connected
sata0/7       47           /dev/sdan                        Connected
sata1/0        8           /dev/sdi                         Connected
sata1/1       20           /dev/sdj                         Connected
sata1/2       32           /dev/sdv                         Connected
sata1/3       44           /dev/sdak                        Connected
sata1/4        9           /dev/sdm                         Connected
sata1/5       21           /dev/sdk                         Connected
sata1/6       33           /dev/sdw                         Connected
sata1/7       45           /dev/sdal                        Connected
sata2          2           /dev/sdq                         Connected
sata2/1       14           /dev/sdd                         Connected
sata2/2       26           /dev/sdr                         Connected
sata2/3       38           /dev/sdad                        Connected
sata2/4        3           /dev/sdu                         Connected
sata2/5       15           /dev/sdf                         Connected
sata3/0        0           /dev/sdy                         Connected
sata3/1       12           /dev/sdb                         Connected
sata3/2       24           /dev/sdo                         Connected
sata3/3       36           /dev/sdaa                        Connected
sata3/4        1           /dev/sdac                        Connected
sata3/5       13           /dev/sdc                         Connected
sata3/6       25           /dev/sdp                         Connected
sata3/7       37           /dev/sdab                        Connected
sata4/0        6                                     Disconnected or not present
sata4/1       18                                     Disconnected or not present
sata4/2       30                                     Disconnected or not present
sata4/3       42           /dev/sdaf                        Connected
sata4/4        7                                     Disconnected or not present
sata4/5       19           /dev/sdg                         Connected
sata4/6       31                                     Disconnected or not present
sata4/7       43           /dev/sdag                        Connected
sata5/0        4           /dev/sdaj                        Connected
sata5/1       16           /dev/sdh                         Connected
sata5/2       28           /dev/sdt                         Connected
sata5/4        5                                      Disconnected or not present
sata5/5        17                                     Disconnected or not present
sata5/6        29                                     Disconnected or not present
sata5/7        41           /dev/sdai                       Connected

Disconnecting a Disk Using cfgdisk

Use the cfgdisk command to disconnect a disk before performing the hot plug event of physically removing it. The following command shows an example of how to use cfgdisk to disconnect a disk drive.

# cfgdisk -o disconnect -d sata5/1

The command returns the following prompts; enter Y at both to disconnect the disk:


Are you sure (y/n)? y
Are you sure sata5/1 device is not in use(y/n)? y
Device sata5/1 has been successfully disconnected.

Connecting a Disk Using cfgdisk

After performing the hot plug event of physically adding a disk into the system, use the cfgdisk command to connect it. The following command shows an example of how to use cfgdisk to connect a disk drive.

# cfgdisk -o connect -d sata5/1

The command returns the following:


Command has been issued to connect sata5/1 device, it may take a few seconds to connect sata5/1, check status by re-running cfgdisk command.

Displaying cfgdisk Help Information

The following command show how to use the cfgdisk command to display help information:

# cfgdisk - h


faultmond

Faultmond is a component of the Disk Control and Monitor Utility (DCMU). Faultmond is a daemon which is started at boot time. It scans all disk at polling intervals, and then reports FRU, SDR, and SEL information to the service processor.

faultmond Command Options

faultmond uses the parameters shown in TABLE 6-2. The following options are supported for the functions shown


TABLE 6-2 faultmond Command Options

Option

Description

-h

Displays help information

-t

Displays polling interval information (in minutes)

-V

Displays version information

-D

Runs as a non-daemon process



Examples Using the faultmond Command

This section contains examples of common faultmond commands. For more information, refer to the faultmond man page.

The following command shows the use of faultmond.

# faultmond -h

The command returns the following:


faultmond version 1.0:

Starting faultmond From the Command-line

To start faultmond, enter the following command:

# service faultmond start

Stopping faultmond From the Command-line

To stop faultmond, enter the following command:

# service faultmond stop

Setting the Polling Interval From the Command Line

To set the polling interval with faultmond, do the following:

1. Stop faultmond from the command line.

# service faultmond stop

2. Set the polling interval. For example, to set the polling interval to be 1 minute, you would enter:

# faultmond -t 1

3. Check the polling interval.

# ps -ef | grep faultmond

The output looks like the following:


# ps -ef |grep faultmond
root     15357     1  5 15:49 ?        00:00:00 faultmond -t 1
root     15364 15307  0 15:50 pts/4    00:00:00 grep faultmond


hotplugmon

hotplugmon is not a command line utility. It monitors hotplug events and then reports them to service processor.



Note - hotplugmon is only activated with faultmond from the command-line or during boot time. To stop or start faultmond and hotplugmon manually, you should use the faultmond service commands.



Viewing System and Service Processor Logs

As described above, DCMU monitors hotplug events and pending drive failures, controlled connect/disconnect events and logs these events in syslog and, more importantly, in the service processor logs (SDR, FRU, SEL). You may access these logs individually for specific information to aid in the administration or troubleshooting of the disk array. This section describes how to view individual log file information from the command line.

Viewing the SDR log

The following commands show how view the SDR log file, either at the server:

# ipmitool -I open sdr elist

or over the network:

# ipmitool -I lan -H SP-IP -U root -P SP-password sdr elist

Where SP-IP represents the IP address of the service processor and SP-password represents the password for the service processor.

Viewing the FRU log

The following commands show how view the FRU log file, either at the server:

# ipmitool -I open fru

or over the network:

# ipmitool -I lan -H SP-IP -U root -P SP-password fru

Where SP-IP represents the IP address of the service processor and SP-password represents the password for the service processor.



Note - When viewing the FRU log of a server running Linux, hard disk drive FRU information stored in the Service Processor FRU log may display a Product Name attribute. This attribute is meaningless, and should be ignored. Here’s an example of what you might see when viewing logged FRU data (via the ipmitool command or the server’s management tool) if this erroneous attribute were present:

FRU Device Description : hdd40.fru (ID 58)
Product Manufacturer : HITACHI
Product Name : 232VDDF12872G-40
<-- Ignore this line
Product Part Number : HDS7225SBSUN250G
Product Version : V44OA81A
Product Serial : VDK41BT4CAD0GE


Viewing the SEL log

The following commands show how view the SEL log file, either at the server:

# ipmitool -I open sel elist

or over the network:

# ipmitool -I lan -H SP-IP -U root -P SP-password sel elist

Where SP-IP represents the IP address of the service processor and SP-password represents the password for the service processor.

Viewing the System log

All events and error information from DCMU are logged in syslog (default: /var/log/messages). These include hard drive hotplug events, drive disconnect and connect events, and drive fault polling events.