G.1 Prerequisites for Replacing a Failing Disk

To replace an HDFS disk or an operating system disk that is in a state of predictive failure, you must first dismount the HDFS partitions. You must also turn off swapping before replacing an operating system disk.

Note:

Only dismount HDFS partitions. For an operating system disk, ensure that you do not dismount operating system partitions. Only partition 4 (sda4 or sdb4) of an operating system disk is used for HDFS.

To dismount HDFS partitions:

  1. Log in to the server with the failing drive.

  2. If the failing drive supported the operating system, then turn off swapping:

    # bdaswapoff
    

    Removing a disk with active swapping crashes the kernel.

  3. List the mounted HDFS partitions:

    # mount -l
    
    /dev/md2 on / type ext4 (rw,noatime)
    proc on /proc type proc (rw)
    sysfs on /sys type sysfs (rw)
    devpts on /dev/pts type devpts (rw,gid=5,mode=620)
    /dev/md0 on /boot type ext4 (rw)
    tmpfs on /dev/shm type tmpfs (rw)
    /dev/sda4 on /u01 type ext4 (rw,nodev,noatime) [/u01]
    /dev/sdb4 on /u02 type ext4 (rw,nodev,noatime) [/u02]
    /dev/sdc1 on /u03 type ext4 (rw,nodev,noatime) [/u03]
    /dev/sdd1 on /u04 type ext4 (rw,nodev,noatime) [/u04]
         .
         .
         .
    
  4. Check the list of mounted partitions for the failing disk. If the disk has no partitions listed, then proceed to "Replacing a Disk Drive." Otherwise, continue to the next step.

    Caution:

    For operating system disks, look for partition 4 (sda4 or sdb4). Do not dismount an operating system partition.

  5. Dismount the HDFS mount points for the failed disk:

    # umount mountpoint
    

    For example, umount /u11 removes the mount point for partition /dev/sdk1.

    If the umount commands succeed, then proceed to "Replacing a Disk Drive." If a umount command fails with a device busy message, then the partition is still in use. Continue to the next step.

  6. Open a browser window to Cloudera Manager. For example:

    http://bda1node03.example.com:7180

  7. Complete these steps in Cloudera Manager:

    Note:

    If you remove mount points in Cloudera Manager as described in the following steps, then you must restore these mount points in Cloudera Manager after finishing all other configuration procedures.

    1. Log in as admin.

    2. On the Services page, click hdfs

    3. Click the Instances subtab.

    4. In the Host column, locate the server with the failing disk. Then click the service in the Name column, such as datanode, to open its page.

    5. Click the Configuration subtab.

    6. Remove the mount point from the Directory field.

    7. Click Save Changes.

    8. From the Actions list, choose Restart this DataNode.

  8. In Cloudera Manager, remove the mount point from NodeManager Local Directories:

    1. On the Services page, click Yarn.

    2. In the Status Summary, click NodeManager.

    3. From the list, click to select the NodeManager that is on the host with the failed disk.

    4. Click the Configuration subtab.

    5. Remove the mount point from the NodeManager.

    6. Click Save Changes.

    7. Restart the NodeManager.

  9. If you have added any other roles that store data on the same HDFS mount point (such as HBase Region Server), then remove and restore the mount points for these roles in the same way.

  10. Return to your session on the server with the failed drive.

  11. Reissue the umount command:

    # umount mountpoint
    

    If the umount still fails, run lsof to list open files under the HDFS mount point and the processes that opened them. This may help you to identify the process that is preventing the unmount. For example:

    # lsof | grep /u11
  12. Bring the disk offline:

    # MegaCli64 PDoffline "physdrv[enclosure:slot]" a0
    

    For example, "physdrv[20:10]" identifies disk s11, which is located in slot 10 of enclosure 20.

  13. Delete the disk from the controller configuration table:

    MegaCli64 CfgLDDel Lslot a0 
    

    For example, L10 identifies slot 10.

  14. Complete the steps in "Replacing a Disk Drive."